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SUMMARY 

Q | Topography and gravity are geophysical fields whose joint statistical structure derives from 

D . interface-loading processes modulated by the underlying mechanics of isostatic and flexural 

compensation in the shallow lithosphere. Under this dual statistical-mechanistic viewpoint an 
rya , estimation problem can be formulated where the knowns are topography and gravity and the 

O ■ principal unknown the elastic flexural rigidity of the lithosphere. In the guise of an equivalent 

c/3 ' "effective elastic thickness", this important, geographically varying, structural parameter has 

been the subject of many interpretative studies, but precisely how well it is known or how best 
—< ■ it can be found from the data, abundant nonetheless, has remained contentious and unresolved 

throughout the last few decades of dedicated study. The popular methods whereby admittance or 
coherence, both spectral measures of the relation between gravity and topography, are inverted 
' for the flexural rigidity, have revealed themselves to have insufficient power to independently 

\ constrain both it and the additional unknown initial-loading fraction and load-correlation fac- 

^f) • tors, respectively. Solving this extremely ill-posed inversion problem leads to non-uniqueness 

t**"* ' and is further complicated by practical considerations such as the choice of regularizing data 

C""'- tapers to render the analysis sufficiently selective both in the spatial and spectral domains. Here, 

we rewrite the problem in a form amenable to maximum-likelihood estimation theory, which 
ly-) | we show yields unbiased, minimum-variance estimates of flexural rigidity, initial-loading frac- 

f^*) . tion and load correlation, each of those separably resolved with little a posteriori correlation 

CNl ' between their estimates. We are also able to separately characterize the isotropic spectral shape 

t— I \ of the initial-loading processes. Our procedure is well-posed and computationally tractable for 

the two-interface case. The resulting algorithm is validated by extensive simulations whose 
• i-H ' behavior is well matched by an analytical theory with numerous tests for its applicability to 

, real-world data examples. 
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1 INTRODUCTION AND MOTIVATION 

With a remarkable series of papers, all entitled Experimental Isostasy, Dorman and Lewis heralded in an era of Fourier-based estimation in 
geophysics, using gravity and topogr aphy to study isostasy "experimentally", that is, w it hout first assuming a pa rticular mechanistic model 
such as Airy or Pratt compensation dDorman & Lewis|[l970t iLewis & Dorman|[l970allbl ; IDorman & Lewi sl ll972h . All three papers remain 
essential reading for us today. 

The first in the series introduced the basic point of view by which Earth is regarded as a linear time-invariant system and the unknown 
"isostatic response" is the transfer function: 

The linear system here is the earth: The input is the topography, or more precisely, the stress due to the topography across some imaginary surface, say sea 
level, and the output is the gravity field due to the resulting compensation. iDorman & Lewisll970l p. 3360.) 

In keeping with classical systems iden tification practice, or in their words, through the fruits of linear mathematics, in particular, harmonic 
analysis and the convolution theorem dDorman & Lewis! 197(1 p. 3358), the recovery of the impulse response practically suggested itself: 

If the earth is linear in its response to the crustal loading of the topography, the response of the earth 's gravity field to this loading can he represented as 
the two-dimensional convolution of the topography with the earth's isostatic response function. [...] Through transformation into the frequency domain, the 
convolution becomes multiplication, and one is led directly to the re sult that the isostatic resp onse function is equal to the inverse transform of the quotient of 
the transforms of the Bouguer gravity anomaly and the topography. IDorman & Lewis! 1970j p. 3357.) 
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Contingent upon establishing the validity of the linear assumption in interpreting the data, subsequently, the isostatic response function was 
to be "inverted", i.e. by computing the density changes at depth that would be required to fit the experimentally determined response function 
dPorman & Lewislll97d p. 3361). However, due to various forms of measurement, geo logical or modeling "no ise", [t]he problems involved 
in computing the inverse [...] of an experimentally determined function are formidable dDorman & Lewislll97(X p. 3361), even when strictly 
local compensation is a ssumed and the solution i s, in principle, unique. 

The second paper dLewis & Dormanlll970allbT) was devoted to discussing the numerous geophysical and numerical strategies by which 
the least-squares inversion of the experimentally derived response can be accomplished at all. Broadly speaking, these involve any or all of 
(a) modification of the data, e.g. by windowing prior to Fourier transformation, (b) modification of the recovered response, e.g. by averaging, 
smoothing, or limiting the frequency interval of interest, (c) conditioning of the unknown density profile, e.g. by series expansion or imposing 
hard bounds, and (d) stabilizing the inversion, e.g. by iteration, frequency weighting, or the addition of minimum <■ i norm constraints on the 
density profile. As a result, many possible local density profiles can be found that "explain", in the £2 sense, the observed response curves, and 
an appeal has to be made to independent outside information, e.g. from seismology and geodynamics, to make the final selection. Regardless 
of the ultimate outcome of this exercise in deciding over which depth the compensating mass anomalies occur, the modeling procedure 
allows for the computation of the so-called "isostatic anomaly". The latter is thereby defined as that portion of the variation in the observed 
terrestrial gravity field that cannot be explained by the difference in measurement position on or above the reference geoid (which leads to 
the free-air anomaly), nor of the anomalous mass contained in the topography above the reference geoid (hence the Bouguer anomaly) — 
but, most importantly, also n ot by the assumption of a linear isostatic compensation mechanism, a t whichever depth or however regio nally 
this is being accommodated dLambecM ll988l;lB lakelv 199l lWattsl200ll ; lTurcotte & Schub ert 2002; Hofmann-Well enhof & Moritzll2 006). 

In their third and final paper ( Dorman & Lewislll972l) the authors employed iBackus & Gilbert! dl97fj|) theory to obtain and interpret 
the result of the inversion of isostatic response functions by way of depth-averaging kernels rather than solving for particular profiles, 
which had shown considerable non-uniqueness and possibly unphysical oscillations. But even admitting that only localized averages of 
the anomalous density structure could be considered known, the authors concluded that the available data called for the compensation of 
terrestrial topography by density variations down to at least 400 km depth, i.e. involving not only Earth's crust but also its mantle. 

If in these papers the main objective was to make isostatic anomaly maps and to recover local density variations at depth to explain the 
cause of isostasy where p ossible, to do the latter reliably arguments needed to be made that involve the strength of the crust and upper mantle 
dLewis & Dormanlll970al p. 3371). In practice, this led the authors to decide that t he constitution of the eart h is such that it is at least able 
to support mass anomalies of wavelengths equal to the depth at which they occur ( Lewis & Dorman 1970a, p. 3383). This contradictio in 
terminis (it is no longer a strictly local point of view) was the very one that led lVening Meineszl dl931 ) to ar gue against the hypotheses o f 
Airy and Pratt: strength implies lateral transfer of stress which is incompatible with the tenets of local isostasy dLamb eck 1988; Watts j200lh . 

Follow i ng a similar line of reasoning in replacing local by regional compensation mechanisms, McKenzie & Bowij tl97(j) and 
Bank s~et"al] d 19771) prese nted a new th eoretical framework by which the observed admittance, indeed the ratio of Fourier- domain gravity 
anomalies to topography (Karner 1982), could be interpreted in terms of a regional compensation mechanism that involves flexure of a thin 
(compared to the wavelength of the deformation) elastic plate (a "lithosphere" defined in its response to long-term, as opposed to seismic 
stresses) overlying an inviscid mantle (an "asthenosphere", again referring to its behavior over long time scales). No longer was the lo- 
cal density structure the driving objective of the inversion of the isostatic response curve, but rather the thickness over which the density 
anomalies could plausibly occur, assuming a certain limiting mantle density. This subversion of the question how to best explain gravity and 
topography data became the no w dominant quest for the determination of t he flexural rigidity or strength, D, of the lithosphere thus defined. 
The theory of plates and shells (Timoshenko & Woi nowskv-Kri eger 1959) could then be applied to translate D into the "effective" elastic 
plate thickness, T e , u pon the further assumption of a Young's modulus a nd Poisson's ratio. A tripartite study entitled An analysis of isostasy 
in the world's oceans (Watts 1978; Cochran 1979; Detrick & Watts 1979) went around the globe characterizing T e in a plate-tectonic context. 
Subsequent additions to the theory involved a few changes to the physics of how deformation was treated, e.g. by considering that the iso- 
static resrxmse may be anisotropic (Stephe nson & Beaumont! 1980l). taking into account non-linear elasticity and finite-amplitude topography 
dRibell 19821), visco-elasticity and erosional feedbacks dStephenson| [T984). and updating the force balance to include also lateral, tectonic, 
stresses (Stephe nson & Lambeckfl985T) . None of these considerations changed the basic premise. With the methodology for effective elastic 
thickness determination firmly established, the way was paved for its rheological interpretation (e.g. lMcNutt & M enard 1982; M cNutJ 19841 : 
iBurov & Diamentlll995l) . 

A firs t hint that not all was w ell in the community came when transfer function theory was applied to measure the strength of the 
continents. Mc Nutt & Parked dl978l) concluded from admittance analysis that, on th e whole, Australia (an old continent) might not have any 
strength, and would thus be in complete local isostatic equilibrium. On the contrary, IZuberetal.ldl989l) concluded on the basis of coherence 
analysis that the Australian continental effective elastic thickness well exceeded 100 km. This apparent contradiction was found despite 
the observed admittance and coherence being merely different "summaries" of gravity and topography: spectral ratios that both estimate the 
underlying isostatic tr ansfer function . At le ast part of the discrep ancy could be ascribed to the treatment of subsurface loads in the formulation 
of the forward model (Forsyth 19851). With lBechtel et al.l(ll990h . and nume rous others after them, t hese a uthors led the next decade in which a 
"thick" (greater than 100 km) continental lithosphere was espoused. Then, McK enzie & Fairhead d 19971) started a decade of making effective 
arguments for "thin" continents (no more than 2 5 km), a controversial position with many ramificatio ns dJacksonll2002l : lBurov & Wattsl2006l) 
that was hotly contested and remains so today dBanks et alj200ll ; ISwain & KirbvJl2003b] : lMcKenzidl2003l . l201Cl) . 

Three developments happened on the way to the current state, with sound arguments made on both sides of the debate. Inverting 
coherence between Bouguer gravity and topography yielded thicker lithospheres than working with the admittance between the free-air 
gravity and the topography. There was discussion over the treatment of "buried loads" and how to solve for the subsurface-to-surface loading 
ratio. Finally, there were argument s over the best way by which to for m spectral estimates o f either admittance or coherence. Among 
others, IPerez-Gussinve et alTd2004l) . IPerez-Gussinve & Watts! d2005l) and lKirbv&Swaird d2009l) provided some reconciliation by making 
estimates of effective elastic thickness that were based on both free-air admittance and Bouguer coherence, respectively. They argued the 
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equivalence of the results when either method was applied in a "consistent" formulation, taking into account the finite window size of 
any patch of available data. Still, large differences remaine d, experime nts on synthetic data showe d sign ificant bias and large variance, 
and a clear consensus failed to arise. Macario et al 1 dl995h . I McKenzi ] 120031) and Kirbv & Swain 1 2009) i nvestig a ted the effect of the 
statis ti cal correlation between su r face and subsurface lo a ds. For their part. iDiamentl (1 1985), Lo wrv & Smith! ( 119941) . ISimons et alj d200d 
120031) . lOieda & Whitmanl d2002h . iKirbv & Swair] 12004 l2008d fbh and lAudet & Mareschall J2007I) focusedon the spectral estimation of 
admittance and coherence via maximum-entropy, multitaper and wavelet-based methods, and identified the spectral bias, leakage and variance 
inherent in those. Much as the controversy involved the geological consequences of a thick versus a thin lithosphere, with only gravity and 
topography as the primary observations and no significant divergence in viewing the physics of the problem, that is, of elastic flexure 
in a multilayered system, over time the arguments evolved into a debate that was mostly about spectral analysis. Least-squares fitting of 
admittance and coherence functions, however determined, had become synonymous with the process of elastic-thickness determination. 

The appropriateness of using least sq uares is not something th a t can be taken for granted but rather needs to be care full y asse s sed, a s 
was pointed out early on in this context bv lDorman & Lewisl ( fl972h , lBanks et ail dl977» . IStephenson & Beaumontl fl980h and lRibd dl982l) . 
which, however, also focused on other issues that have since received more attention. Admittance and coherence are "statistics": functions 
of the data with non-Gaussian distributions even if the data themselves a re Gaussian (Munk & Cartwright 1966; barter etai]|l973l : IWaldenl 
Il99d : iThomson & Chavelll99ll : Ifouzi & Lopesll 19961: Ifouzi et aljfl999h . Estimators for flexural rigidity based on any given method have 
their own distributions, though not necessarily ones with a tractable form. Without knowledge of the joint properties of admittance- and 
coherence-based estimators it is impossible to assess the relative merits of any method for a given data set or true parameter regime; with 
current state-of-the-art understanding it is not even clear if the two methods are statistically inconsistent. 

At this juncture this paper aims for a return to the basics, by asking the question: "What information does the relation between gravity 
and topography contain about the (isotropic) strength of the elastic lithosphere?" and by formulating an answer that returns the full statistical 
distribution of the estimates derived from such data. As such, it should provide a framework for the interpretation of the early work on which 
we build: as others before us we are merely using the measurable ingredients of gravity, topography and the flexure equations. However, as 
we shall see, we do not need to consider this a two-step process by which first the transfer function needs to be estimated non-parametrically 
and then the inversion for structural parameters performed with the estimated transfer function as "data". This approach amounts to a loss 
of most of the degrees of freedom in the data, replacing them with spectral ratios estimated at a much smaller set of wavenumbers, and 
with much of the important information on the flexural rigidity compromised due to lack of resolution at the low wavenumbers. Rather, we 
can treat it as an optimization problem that uses everything we know about gravity and topography available as data to directly construct 
a maximum-likelihood solution for the lithospheric parameters of interest. These are returned together with comprehensive knowledge of 
their uncertainties and dependencies, and with a statistical apparatus to evaluate how well they explain the data; the analysis of the residuals 
then informing us where the modeling assumptions were likely violated. By the principle of functional invariance the maximum-likelihood 
solution for elastic thickness and loading ratio also returns the maximum-likelihood estimates of the coherence and admittance themselves, 
which can then be compared to those obtained by other methods. Admittance may be superior to coherence, or vice versa, in p articular 
scenarios, but only maximum-likelihood, b y definition, produces solutions that are preferred globally for all parameter regimes dPawitanl 
l200ll ; ISeverinill200ll ; I Young & Smithll2005t). Finally, we note that unders tanding the likelihood is also a key component of fully Bayesian 
solution approaches (e.g. Mosegaard & Tarantola 1995; Kaipio & Somersalo 2005). 



2 BASIC FRAMEWORK 

Despite their singular focus on deriving density profiles to reconstruct the portion of the Bouguer gravity field that is linearly related to the 
topography and thereby "explain" the isostatic compensation of surface topography to first order, even when the strength of the lithosphere 
had to be effectively prescribed, Dorman and Lewis' Experimental lsostasy 1, 2 and 3 containe d virtually all of t he elements of the analysis 
of gravity and topography by which the problem could be turned around to the, in the words of iLambeck] ( 1 19881) "vexing", question "What 
is the flexural strength of the lithosphere"? The elements applicable to the analysis were the expressions for admittance and coherence 
between topography and the Bouguer, free-air, and isostatic residual gravity anomalies, the averaging or smoothing required to statistically 
stabilize the estimate of the transfer function that is the intermediary between the data and the model obtained by inversion for the unknown 
parameters (if not the density distribution, then the mechanical properties of the plates), the notion of correlated and uncorrelated noise of 
various descriptions: indeed all of the ingredients that will form the vernacular of our present contribution. In this section we redefine all 
primary quantities of interest in a manner suitable for the statistical development of the problem. 

We treat Earth locally as a Cartesian system. Our chosen coordinate system has x = (xi, X2) in the horizontal plane and defines z 
pointing up: depths in Earth are negative. A density contrast located at interface j is found at depth Zj < 0, and is denoted 

-V- Pi /', i- (1) 

Two layers is the minimum required to capture the full complexity of the general problem which may, of course, contain any number of 
layers. In a simple two-layer system, the first interface, at z\ — 0, is the surface of the solid Earth, and po is the density of the air (or water) 
overlying it. The density of the crust is pi, and the second interface, at Z2 < 0, separates the crust from the mantle with density p2. 

For now we use the term "topography" very generally to describe any departure from flatness at any surface or subsurface interface. 
By "gravity" we mean the "anomaly" or "disturbance"; both are differences in gravitational acceleration with respect to a certain reference 
model. These departures in elevation and acceleration are all small: we consider topography to be a small height perturbation of a constant- 
depth interface, and neglect higher-order finite-amplitude effects on the gravity. We always assume that the "loads", the stresses exerted by 
the topography, occur at the density interfaces and not anywhere else. If not in the space domain, x, we will work almost exclusively in the 
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Figure 1. Synthetic data representing the standard model, identifying the initial, Hj, equilibrium, 'Hij , and final topographies, Hoj , emplaced on a lithosphere 
with flexural rigidity D. The initial loads were generated from the Matern spectral class with parameters a 2 , p and v\ they were not correlated, r = 0, and the 
spectral proportionality was / 2 . Also shown, by the black line, is the Bouguer gravity anomaly, Go2- The density contrasts used were Ai = 2670 kgm -3 
and A2 = 630 kgm -3 , respectively. All symbols and processes are clarified in the text. They will furthermore be identified and briefly explained in Table[T] 



Fourier domain, using the wave vector k or wavenumber (spatial frequency) k = ||k|l . We only d istinguish between both domains when we 
need to, and then only by their argument. All of this corresponds to standard practice (Watts 

Looking ahead we draw the readers' attention to Fig.Q] which contains a graphical representation of the problem. Fig. [T]is, in fact, the 
result of a data simulation with realistic input parameters. Many of the details of its construction remain to be introduced and many of the 
symbols remain to be clarified. What is important here is that we seek to build an understanding of how, from the observations of gravity and 
topography, we can invert for the flexural rigidity of the lithosphere in this two-layer case. The observables (rightmost single panel) are the 
sum of the flexural responses (middle panels) of two initial interface-loading processes (leftmost panels) which have occurred in unknown 
proportions and with unknown correlations between them. 



2.1 Spatial and spectral representation, theory and observation 

Writing H and Q without argument we will be referring quite generically to the random processes "topography" and "gravity" respectively, 
though when we consider either physical quantity explicitly in the spatial or spectral domain we will distinguish them accordingly as 

'H(x) or <5(x) (in space), and dH(k) or dQ{k) (in spectral space), (2) 



where they depend on spatial position x or on wave vector k, respectively. In doin g so we use to the Cramer d 19421) spectral representation 
under which dH(k) and dQ(k) are well-defined orthogonal increment processes ferillingei|[l975l ; iPercival & Waldenlll993l) . in the sense 
that at any point in space we may write 

W(x) = J J e lkx dW(k) and Q{x) = J J e lkx d£(k). (3) 

We make the assumption of stationarity such that for every point x under consideration all equations of the type ((3} are statistically equivalent. 
We further assume that both processes will be either strictly bandlimited or else decaying very fast with increasing wavenumber k — ||k| 
such that we may restrict all integrations over spectral space to the Nyquist plane k £ [— it, n] x [— n, n] . While this is certainly a geologically 
reasonable assumption we would at any rate be without recourse in the face of the broadband bias and aliasing that would arise unavoidably 
if it were violated. For simplicity x maps out a rectangle that can be sampled on an M x TV ~ 2K grid given by 

x={(m,n):m = 0,...,M-l;n = 0,...,/V-l}. (4) 

In the non-rarified world of geophysical data analysis we will not be dealing with stochastic processes directly, rather with particular realiza- 
tions thereof. These are our gravity and topography data, observed on finite domains, to which we continue to refer as 'H(x) and Q (x). The 
modified Fourier transform of these measurements, obtained after sampling and windowing with a certain function tiij;(x), is 
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J ff(k) = ^™ K (x)-H(x)e- ,;k x = ^u- K (x)^ JJ e lk '' x dH(k')^ lk x = jj W K {k - k') dW(k'). (5) 
In this expression l%(k) is the unmodified Fourier transform of the energy-normalized applied window, 

W K (k) = ^ Wif (x)e- lk ' x . (6) 

X 

The spectral density or spectral covariance of continuous stationary processes is defined as the ensemble average (denoted by angular 
brackets) 

(dH(k.)<m*(k')) =5« w (k)dkdk'5(k,k'), (7) 

whereby we denote complex conjugation with an asterisk and 5(k, k') is the Dirac delta function. There can be no covariance between 
non-equal wavenumbers if the spatial covariance matrix is to be dependent on spatial separation and not location, as from eqs ([3j and {7} 

(W(x)«V)> = JJJJ ^ k,x e-*'- x ' {dH(k)dH*(k')} = ^e* (x - x ' ) 5„ w (k)dk = C„„(x-x'). (8) 

In contrast to eq. Q, as follows readily from eqs l(5) and Q, the covariance between the modified Fourier coefficients of the finite sample is 

(Jf(k)JT(k')> = JJJJ W JC (k-k")Wi(k'-k" / )(dW(k")d«(k'")> = JJ W K {* - k")W£(k' - k")5««(k") dk". (9) 

Eqs l(5j and ((9) show that the theoretical fields tfH(k) and their spectral densities Snufe) are out of reach of observation from spatially 
finite sample sets. Spectrally we are always observing a version of the "truth" that is "blurred" by the observation window. Even if, or rather, 
especially when the windowing is implicit and only consists of transforming a certain rectangle of data, this effect will be felt. For example, 
whereas the true spectral density is obtained by Fourier transformation of the covariance at all lags, denoted by the summed infinite series 

<Wk) = ^e- lk - y C w „(y) = // ^ e - l(k ' k ' ) y 5^(k')dk'= // 5(k,k')<Wk')dk', (10) 



a blurred spectral density is what we obtain after observing only a finite set, denoted by the summed finite series 

«W(k) = ^ e - k " C „H(y) = -i^^ e -*' x e lkx ' /7 e *' x e- lk '' x '5„„(k')dk' (11) 
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(12) 



^EE e " i(k " k ' ) ' Xel<k " k ' ) ' X ' 5 ««( k ') dk ' = jj \FK(k-k')\ 2 Snn(k')dk\ 

x x' 

with j-Fjc | 2 denoting Feier's kernel (Percival & W alden 1993). The design of suitable windowing functions (in this geophysical context, see, 
e.g., ISimons et al.ll200q 120031 : i imons & Wand 120 lib , is driven by the desire to mold what we can calculate from the observations into 
estimators of these "truths" that are as "good" as possible, e.g. in the minimum mean-squared error sense; we will keep the windows or 
tapers iok(x) and the convolution kernels Wx(k) generically in all of the formulation. For the gravity observable, whose spectral density is 
denoted Sgg, we find the modified Fourier coefficients and the spectral covariance, respectively, as 

G(k) = // Wx(k-k')d0(k') a nd (G(k)G*(k')> = // W K (k - k")W*(k' - k")<S ss (k") dk". (13) 



Finally, we will need to sample H(k), Wic{k), and G(k) on a grid of wavenumbers. Exploiting the Hermitian symmetry that applies in the 
case of real-valued physical quantities, for an M x N data set we select the half-plane consisting of the K = M x ( [N/2\ + 1) wave vectors 



{( 



2tt 
M 



M 
T 



+ m 



2tt 

n\ : m = 0, . . . , M — 1 : n = 0, 



' N 



(14) 



The quantities H(k), Wjf(k), and G(k) are complex except at the dc wave vectors (0, 0) and the Nyquist wave vectors (0, tt), (— n, 0) and 
{—it, 7r) if they exist in eq. H41 , which depends on the parity of M and N. 



2.2 Topography 

As mentioned before, we apply the term "topography", H, generically to any small perturbation of the Cartesian reference surface, which 
is assumed to be flat. Specifically, we need to distinguish between w hat we shal l call ' initi al', 'equilibrium' an d 'final' topographies, re- 
spectively. In the classic multilayer loading scenario reviewed by, e.g. jMcK enzi ] 120031) and lSimons etld] 120031) . as the jth interface gets 
loaded by an initial topography, the singly-indexed quantity T-Lj, a configuration results in which each of the interfaces expresses this loading 
by assuming an equilibrium topography, which is identified as the double-indexed quantity Hij . The first subscript refers to the interface on 
which the initial loading occurs; the second to the interface that reflects this process. The state of this equilibrium is governed by the laws 
of elasticity, as we will see in the next section. All of these equilibrium configurations combine into what we shall call the final topography 
on the jth interface, namely Hoj, where the o is meant to evoke the summation over all of the interfaces that have generated initial-loading 
contributions. 
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Thus, in a two-layer scenario, what in common parlance is called "the" topography, i.e. the final, observable height of mountains and 
the depth of valleys expressed with respect to a certain neutral reference level, will be called 7*oi, and this then will be the sum of the two 
unobservable components Tin and 7*21. In other words, the final "surface" topography is 

Hoi =7*11+7*21. (15) 

Likewise, the final "subsurface" topography, 7*o2, is given by the sum of two unobservable components 7*12 and 7*22, totaling 

7*o2=7*i2+7*22. (16) 

This last quantity, 7*o2, is not directly observable but can be calculated from the Bouguer gravity anomaly, as we describe below. Both 7*n 
and 7*12 refer to the same geological loading process occurring on the first interface but being expressed on the first and second interfaces, 
respectively. In a similar way, 7*21 and 7*22 refer to the process loading the second interface which thereby produces topography on the first 
and second interfaces, respectively. 

While postponing the discussion on the mechanics to the next section it is perhaps intuitive that a positive height pertur bation at on e 
interface creates a negative deflection at another interface: "mountains" have "roots", as has been known since the days of Airy dWattsl200lh . 
The initial-loading topography, then, is given by the difference between these two equilibrium components. At the first and second interfaces, 
respectively, we will have for the initial topographies at the surface and subsurface, respectively, 

7*i = 7*n-7*i2, (17) 
7*2 = 7*22 — 7*2i- (18) 

The sum of all of the equilibrium topographies, at all of the interfaces in this system and thus requiring two subscripts o, is given by 
7*oo=7*ii+7*i2+7*2i+7*22, (19) 
which is a quantity that we can only access through the free-air gravity anomaly that it generates, as we shall see. 

2.3 Flexure 

Mechanical equilibrium exists between 7*n and 7*12 on the one hand, and 7*21 and 7*22 on the other. The equilibrium refers to the balance 
between hydrostatic driving and restoring stresses, which depend on the density contrasts, and the stresses resulting from th e elastic strength 
of the lithosphere. Introduc ing the flexural rigidity D, in units of Nm, we obtain the biharmonic flexural or plate equation teankset all 19771 : 
Turcottc & Schubert! 19821) as follows on the first (surface) interface: 

(V + ip) 7* 12 (x) = -^ u (x), (20a) 
and at the second (subsurface) level, we have 

V 4 + ^)7*21(X) = -^7*22(X). (20b) 

The mechanical constant D is the objective of our study: geologically, this yields to what is commonly referred to as the "integrated strength" 
of the lithosphere, which can be usefully interpreted under certain assumptions as an equivalent or "effective" elastic thickness. This quantity, 
T e , in units of m, relates to D by a simple scaling involving the Young's modulus E and Poisson's ratio, v, as is well known (e.g. lRanallil 
1 19951: IWattsl200ll; iKennett & Bungel 2008 ). Here we follow these authors and simply define 

D = — (21) 

12(1 ~u 2 )- ( ' 

Much has been written about what T e rea l ly "means" in a geo logical context dLowrv & Smith! Il994t iBurov & Diamentj 1 19951: 
lLowrv & Smitr]| 19951 : iMcKenzie & Fairheadfl997l : IBurov & Watts! 20061) . This discussion remains outside of the scope of this study. More- 
over, eqs d20t are the only go verning equations that we s hall consider in this problem. It is n ot exact (e.g. McKenzie & Bowin 1976; Ribe 
19821). it is not complete (e.g. ITurcotte & Schubertlll982l) . and it may not even be r ight (e.g. lKarnerl[l9 82; Stephe nson & Lambeckj l 1 
McKenzie 2010). F or that matter, a single, isotropic D may be an oversimplification dStephenson & Beaumontlll980l : lLowrv & Smith! *! 



Simo ns et alJl200Ct 120031 : lAudet & Mareschalll2004l : ISwain & Kirby||2003bl : iKirbv & Swainll2006h . However, the neglect of higher-order 
terms, additional tectonic terms in the force balance, time-dependent visco-elastic effects and elastic anisotropy remain amply justified on 
geological grounds. It should be clear, however, that any consideration of such additional complexity will amount to a change in the governing 
equations ( 120b . which we reserve for further study. 

At the surface, eq. ([20a) is solved in the Fourier domain as 

d7*i 2 (k) = -dWnOOA!^ 1 *-^*), (22) 

where we have defined the dimensionless wavenumber-dependent transfer function baptized by iForsvthld 19851) 

DA- 4 

f(*) = l + -£-. (23) 
3A2 



At the subsurface, eq. d20b> has the solution 

d7*2i(k) = -d7*2 2 (k)A 1 " 1 A 29 i- 1 (fc) J (24) 
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with the dimensionless filter function 

0(fc) = l+^. (25) 

All of the physics of the problem is contained in the equations in this section. As a final note we draw attention to the assumption that 
the interfaces at which topography is generated and t hose on which the resulting deformation is expresse d coincide: this is the first of th e 
important simplifications introduced by iFors vthl (fl 9 8 5 ) . This assumption, though not universally made (e.g. McNutt 1983; Banks et alj200lh . 
is broadly held to be valid. Finding D in this context is the estimation problem with which we shall concern ourselves. 

2.4 Gravity 

Every perturbation from flatness by topography generates a corresponding effect on the gravitational acceleration when compared to the 
reference state. We relate the gravity anomaly to the disturbing topography by the density perturbation Aj and account for the exponential 
decay of the gravity field from the depth Zj < where it was generated. The "free-air" gravitational anomaly dHofmann-Weilenhof & Moritzl 
2006) from the topographic perturbation at the jth interface that results from the ith loading process is given in the spectral domain by 

dg iJ (k) = 2irGA J dH lJ (k)e kz \ (26) 

where G is the universal gravitational constant, in m 3 kg -1 s~ 2 , not to be confused with the g ravity anomaly itself. Once again this equation is 
inexa ct in assuming local Cartesian geometry dTurcotte & S chubert 1982; McKenzie 2003) and neglecting higher-order finite-amplitude ef- 



fects (Parker 1972; Wieczorek & Phillips 1998), but for our purposes, this "infinite- slab approximation" will be good enough. The observable 
free-air anomaly is the sum of all contributions of the kind (I26t , thus in the two-layer case 

dg oa (k) = dQn (k) + dg 12 (k) + dg 2 i (k) + dg 22 (k) . (27) 

The Bouguer gravity anomaly is derived from the free-air anomaly by assuming a non-lat erally varyi ng density contrast across the surface 
interface. It thus removes the gravitational effect from the observable surface topography (IBlakelvlll995l) . and is given by 

dg o2 {k) = dg 12 {k) + dg 22 (k) (28) 

= 2nGA 2 e kZ2 dH o2 (k) (29) 

= 2nGA 2 e kZ2 [dHi 2 (k) + dH 22 (k)] (30) 

= -2nGA 2 e kZ2 [dHiiCkOAiA^C^ft) - dH 22 {\C)] . (31) 
In this reduction, we have used eqs (|26M27|l. ( 1 16t and d22t . For simplicity we shall write the Bouguer anomaly as 

dg o2 {k) = X (fc)dHo 2 (k), (32) 
defining one more function, which acts like a harmonic "upward continuation" operator (Blakelv 1995), 

X (k) = 2nGA 2 e kZ2 . (33) 

At this point we remark that topography and gravity, in one form or another, are the only measurable geophysical quantities to help us 
constrain the value of D. The Bouguer anomaly g o2 is usually computed from the free-air anomaly g ao and the topography Hoi, assuming a 
density contrast Ai . Any estima tion problem th at deals with any combination of these variables should thus yield results that are equivalent to 
within the error in the estimate dTarant ola 2005), though whether the free-air or the Bouguer gravity anomaly is used as the primary quantity 
in the estimation process could have an effect on the properties of the solution depending on the manner by which it is found — a paradox 
that this paper will eliminate. 

2.5 Observables, deconvolution, and loading 

We are now in a position to return to writing explicit forms for the theoretical observables from whose particular realizations (the data), 
ultimately, we desire to estimate the flexural rigidity D. These are the final "surface" topography, given by combining eqs l |15l > and i24l as 

dHoi (k) = dHu (k) - dH 22 (k) Ar 1 A 2 0" 1 (k) . (34) 

By analogy we shall write for the final "subsurface" topography that which we can obtain by "downward continuation" dBlakefrll 1995b of 
the Bouguer gravity anomaly. From eq. ( 1321 . or combining eqs d!6t and d22t this quantity is then 

dH a2 {k) = X ~ 1 (k)dg o2 (k) = -d'Hii(]i)A 1 A 2 1 C 1 (k) + dn 22 (k). (35) 

The dependence on the parameter of interest, the flexural rigidity D, is non-linear through the "lithospheric filters" 4> and £. While both Hoi 
and H o2 can thus be "observed" (or at least calculated from observations) we are f or the moment taciturn about the complexity caused by 
the potentially unstable inversion of the parameter \ ( see also lKirbv & Swainll201 lh . We return to this issue in Section[5] 

Combining eqs dl7M18t with eqs d22ll-ll24t and then substituting the results in eqs (B4 M351 yields the equations th at relate 



the ob served topographies on either interface with the applied loads. Without changing from the expressions first derived by Forsyth] 
dl985h these have come to be called the "load-de convolution" equations l lLowrv & Smith 19941: iBanks et alj|200ll ; ISwain & Kirbv 2003a; 
iPerez-Gussinve et al .120041 : iKirbv & Swa in 2008a. b). They are usually expressed in matrix form as 
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tfHoi(k) 



Ai+A 2 C(fc) 

-Ax 
Ai + A 2 C(fc) 



-A 2 
A 1( j}(k) + A 2 

Ai^(fc) 
Ai^(fc) + A 2 



with the inverse relationships given by 
d"H 2 (k) 



^(fc)C(fc) 



~Al 

Ai0(fc) + A 2 

25 



ffii(k) 
cW 2 (k) 



Ai + A 2 £(fc) 
SI 

g(fc)[A^(fc) + A 2 ] 
£1 



(36) 



d-ftoi(k) 
tfH o2 (k) 



(37) 



It should be noted that when D = 0, in the absence of any lithospheric flexural strength, thus in the case of complete Airy isostasy, — 1 at 
all wavenumbers, and no such solutions exist. In that case the problem of reconstructing the initial loads has become completely degenerate. 

Armed with these solutions we can solve for the equilibrium loads. Combining eqs dl7M18t with eqs J22M24t returns usable forms 
for Hn and H22, and substituting the results back into eqs d22t-(l24t returns H12 and H21, all in terms of the initial loads Hi and "% 2 , as 



dWii(k) = 
dWia(k) = 



tfHi(k) 



A!+A 2 £(fc) 

-Ai 
Ai + A 2 £(fc) 



and 



and 



dH 22 (k) 



= dH 2 (k) 
= d"H 2 (k) 



Ai^(fc) 
Ai0(fc) + A 2 

~A 2 
A 1( f>(k) + A 2 ' 



(38a) 



(38b) 



(39) 
(40) 



To complete this section we formulate the initial-loading stresses, in kgm x s , at each interface as 

Xi = gA x U u 
I2 — gA.2H2- 

All variables that we have introduced up to this point are listed in TableQ] to which we further refer for units and short descriptions. We 
are now also in the position of further interpreting Fig. [T] once again drawing the readers' attention to the heart of the problem, which is the 
estimation of the single parameter, the flexural rigidity D, which is responsible for generating, from the initial loads (left), the equilibrium 
topographies (middle) whose summed effects (right) we observe in the form of "the" topography and the (Bouguer) gravity anomaly. 



2.6 Admittance and coherence 

Modeled after eq. {7), the Fourier-domain relation between the theoretical observable quantities that are the surface topography 7-Loi and the 
Bouguer gravity anomaly g a2 is encapsulated by the complex-valued theoretical Bouguer admittance, which we define as 

(dg o2 {K)d-HUK)) _ ( , (du 2(k)dHUk)} 

A quantity whose expression eliminates the dependence on the location of the first interface contained in the term \ °f e 1- J32t is the 
real-valued Bouguer coherence-squared, the Cauchy-Schwarz bounded quantity 

y 2 (k)= \(dg o2 (M)duum 2 \{dH o2 (k)dHUm 2 0< 2 fkK1 (42) 

/o( ] (dH 01 (k)dH* 01 (k)}(dg o2 (k)dg* o2 (k)} (dw 01 (k)dH: 1 (k))(dH o2 (k)dw: 2 (k))' u -t°^- • ^> 

As illustrated by eqs (I9l-l ll3t . similarly, the values of either ratio when calculated using actual observations Hoi and G o2 or H02, with 
or without explicit windowing, will be estimators for eqs (14 It and ( 142b . but will never manage to recover more than a blurred version of 
the true cross -power spectral density ratios that they ar e, and with an estimation variance that will de pend on how the required averaging is 
implemented |Thomsonll982l:|Percival & Waldenl 19931). Despite the various attempts by many authors llDiamentll985l : lLowrv & SmitrJl994l 
ISimons et al]|2000l |2003l : iKirbv & Swairjl2004 l2o"iTlAudet & Mareschalll2007l : ISimons & Wandl201 ll) to design optimal data treatment, 



wavelet or (multi-)windowing procedures, with the common goal to minimize the combined effect of such bias or leakage and estimation 
variance, in the end this may result in a well-defined (non-parametric) estimate for coherence and admittance, but the actual quantity of 
interest, the flexural rigidity, D, still has to be determined from that. As we wrote in the Introduction, understanding the statistics of the 
estimators for D derived from estimates of coherence or admittance depends on fully characterizing their distributional properties: a daunting 
task that, to our knowledge, has never been successfully attempted. Without this, however, we will never know which method is to be preferred 
under which circumstan ce. Moreover, we will never be able to properly characterize the st andard errors of the estimates except by exhaustive 
trial and error (see, e.g., Perez-Gussinve et al. 2004; Crosby 2007; Kalnins & Watts 2009) from data that are synthetically generated. This is 



no trivial task (Macario et al. 1995; Ojeda & Whitman 2002; Kirbv & Swain 2008a. b, 2009); we return to this issue later. 



We have hereby reached the essence of this paper: our goal is to estimate flexural rigidity D from observed topography "Hoi and 
gravity <J o2 ; estimates based on inversions of estimated admittance and coherence have led to widely different results, a general lack of 
understanding of their statistics, and thus a failure to be able to judge their interpretation. We must thus abandon doing this via the intermediary 
of admittance Q a and coherence 70 , and rather focus on directly constructing the best possible estimator for D from the data. This realization 
is not unlike that made in the last decade by the seismological community, where the inversion of (group velocity? phase velocity?) surface- 
wave dispersion curves or individua l-phase travel-time measurements has made way for "full-waveform inversion" in its many guises (e.g. 
iTromp et alj|2005t iTape etai]|2007h . There too, the model is called to explain the data that are actually being collected by the instrument, 
and not via an additional layer of measurement whose statistics must remain incompletely understood, or modeled with too great a precision. 
In cosmology, the power-spectral density of the cosmic micr owave background radiation jDahlen & S imons 2008) is but a step towards the 
determination of the cosmological parameters of interest (e.g. ljungman et al.lll996l : lKnoxlll995l : IOh etalj 19991) . 
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unit 


description 


character 


eq. 


n 




flexural rigidity of the lithosphere 


estimated 


J2UJ 


9 


9 

ms ^ 


reference gravitational acceleration 


assumed 


<20l 


(t 


kg~ 1 s~ 2 


universal gravitational constant 


assumed 


Jnnl 


Za 


ni 


location of jth interface 


assumed 




A,- 


kgm - ^ 


density contrast across jth interface 


assumed 


hi 


Hi 


m 


initial topography applied by loading of interface 1 


calculable 




it- 2 


ni 


initial topography applied by loading of interface 2 


calculable 


CD 


- L 'L 


Pa 


initial load applied at interface 1 


calculable 






Pa 


initial load applied at interface 2 


calculable 


ED 


rtll 


111 


equilibrium topography of interface 1 produced by loading at interface 1 


calculable 


Jot! 


%21 


m 


equilibrium topography of interface 1 produced by loading at interface 2 


calculable 


I24> 


0.0 1 


m 


final topography of interface 1 resulting from all interface loading 


measured 


ED 


ni2 


m 


equilibrium topography of interface 2 produced by loading at interface 1 


calculable 


(22) 




ni 


equilibrium topography of interface 2 produced by loading at interface 2 


calculable 


ed 




m 


final topography of interface 2 resulting from all interface loading 


calculated 


ED 


c 

> 




filter relating topographies on both interfaces resulting from loading at interface 1 


calculated 


ED 


c/> 




filter relating topographies on both interfaces resulting from loading at interface 2 


calculated 




X 




filter by which final topography on interface 2 maps into Bouguer anomaly 


calculated 


ED 




m 


sum of all topographic expressions of all loading processes 


calculable 


ED 


Goo 


ms~ 2 


free-air gravitational anomaly due to all loading and flexure 


measured 


ED 


Go2 


ms -2 


Bouguer gravitational anomaly due to all loading and flexure 


calculated 


ED 


Qo 


2 —2 

m s 


complex admittance of Bouguer anomaly and topography 


estimable 


ED 


<Y 2 

/O 




real coherence-squared of Bouguer anomaly and topography 


estimable 


ED 




m 2 


(cross-)spectral density between initial topographies at interfaces i and j 


estimated 


ED 




in 2 


(cross-)spectral density between final topographies at interfaces i and j 


estimable 


ED 


r 




correlation coefficient between initial loading at interface 1 and 2 


estimated 


ED 


f 




ratio of the spectral densities of the initial loads at interface 1 and 2 


estimated 


ED 


Qj 


s- 2 


Bouguer/topography admittance for uncorrelated proportional loading at both interfaces 


estimable 


ED 


Si 


S - 2 


Bouguer/topography admittance for loading only at interface 1 


estimable 


ED 


Q 2 


s" 2 


Bouguer/topography admittance for loading only at interface 2 


estimable 


ED 






Bouguer/topography coherence for uncorrelated proportional loading at both interfaces 


estimable 


ED 



Table 1. Subset of symbols used in this paper, their units and physical description, their role in our estimation process, and relevant equation numbers. 



3 THE STANDARD MODEL 

The essential elements of a geophysical and statistical nature as they had been broadly understood by the late 1970s were reintroduced in the 
previous sectio n in a consistent framework. In this section we disc uss the important innovations and simplifications brought to the problem by 
iForsvthl ( fl985h . In a nutshell, in his seminal paper, IForsvthl ( fl985h made a series of model assumptions that resulted in palatable expressions 
for the admittance and the coherence as defined in eqs ( I41t and i42l , neither of which would otherwise be of much utility in actually "solving" 
the problem of flexural rigidity estimation from gravity and topography. The first two of these were already contained in eq. HOY , loading and 
compensation occur discretely at one and the same set of interfaces, and the constant describing the mechanical behavior of the system is a 
scalar parameter that does not depend on w avenumber nor direction. The first assumption might be open for debate, and indeed alternatives 
have been considered in the literature (e.g. banks et"ai]|l977l 1200 lh . but reconsidering it would n ot fundamentally alter the nature of the 
problem. The second: iso t ropy of the lithosphere, which is certainly only a null hypothesis (see, e.g. lStephenson & Beaumo nt 1980; B echtell 
1989; ISimons et al.ll200Cil2003l : ISwain & Kirbvll 2003d : iKirbv & SwairJhood and many observational studies that work on the premise that 
it must indeed be rejected), does require a treatment that is to be revisited but presently falls outside the scope of this work. To facilitate the 
subsequent treatment we restate the equations of Section |2~5l in matrix form. 



3.1 Flexure of an isotropic lithosphere, revisited 



We shall consider the primary stochastic variables to be the initial-loading topographies Hi and H2, respectively, and describe their joint 
properties, and their relation to the theoretical observable final topographies Hoi and "Ho2 by defining the spectral increment process vectors 



dU(k) = 



dHi(k) 
dH 2 (k) 



and cfHo(k) = 



tfHoi(k) 



Subsequently, we express the process by which lithospheric flexure maps one into the other in the shorthand notation 
tfHo(k) = M D (k) dU(k) and tfK(k) = M^fe) dH a (k), 



(43) 



(44) 
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where the real- valued entries of the non-symmetric lithospheric matrices M_o(fc) and M^, 1 (k) can be read off eqs d36t-d37t and the func- 
tional dependence on the scalar constant flexural rigidity D is implied by the subscript. We now define the (cross-)spectral densities between 
the individual entries in the initial-topography vector dH(k) as in eq. Q by writing 

{dHi(k.)dH*j(k')) = S tJ (k) dkdk'8(k, k'), (45a) 
and form the spectral matrix <S(k) from these elements using the Hermitian transpose as 



{dU{k)dU H (k')} = 5(k) dkdk'S(k,k') 



5n(k) 5i 2 (k) 
5 2i (k) 5 22 (k) 



dkdk'S(k,k'). (45b) 



Lithospheric flexure transforms the spectral matrix of the initial topographies, 5(k), to that of the final topographies, <S (k), defined as 
(dHo(k)dH? (k')) = 5 D (k) dkdk'5(k,k') and (dn oi (k)dH* oj (k')) = S oij {k) dkdk'S(k,k'), (46) 
via the mapping implied by eqs J44t through ( 146) . We specify 

5 (k) = M D {k)S(k)Ml(k). (47) 
We can now see that the theoretical admittance and coherence of eqs can equivalently be written as 

s-\ / 1 \ n ,So2l(k) 2/i \ |5o2l(k)| 2 

Q ° (k)=X(fc) 5^(k) ^ 7 ° (k)= 5 011 (k)5 22(k) ' (48) 

which explains why so many authors before us have focused on admittance and coherence calculations as a spectral estimation problem. 

To be valid spectral matrices of real-valued bivariate fields, the complex- valued S(k) and <S (k) only need to possess Hermitian 
symmetry, that is, invariance under the conjugate transpose, and be positive-definite, that is, have non-negative real eigenvalues. The spectral 
variances of the initial and final topographies at the individual interfaces, 5n(k) > and 522 (k) > 0, both arbitrarily depend on k, but 
without dependence between k 7^ k'. The only additional requirements are that 5i2(k) = 52i(k) and |5i2(k)| 2 < 5n(k)52 2 (k). The 
general form of 5(k) as a stationary random process can be rewritten with the aid of a coherency or spectral correlation coefficient, r(k), 
which expresses the relation between the components of surface and subsurface initial topography as 

r(k) = — — Sl2 ^ where |r(k)| < 1 for all k. (49) 

^SMkj^ik) 



S(k) 



for all k, (50) 



This correlation coefficient is in general complex-valued as the two fields may be spatially slipped versions of one another. The representation 

5n(k) r(k) v /5ii(k)^/5 22 (k)' 

r*(k) x /5n(k)v'522(k) 5 22 (k) 

is simply a most complete description of a bivariate random spectral process (Christakos 1992). 

Should we make the additional assumption of joint isotropy for all of the loads, the spectral matrices would both be real and symmetric, 
<S(k) = S(k) and 5 (k) = S (k). In keeping with the notation from eq. ([8}, we would require a spatial covariance matrix to only depend 
on distance, not direction. With 6 the angle between k and x — x' we would have the real-valued 

C(x-x') = JJ e lk(x " x,) 5(k)dk = JJ e' kl]x - x ' llcoBt> S(k)d6kdk = 2-K J J (k\\x - x'||) S(k) kdk = C(||x - x'||), (51) 

with Jo the real-valued zeroth-order Bessel function of the first kind. With S real, the spectral variances and covariances between top and 
bottom loading components would all be real-valued and so would the correlation coefficient r(k) = r(k). It is important to note that the 
isotropy of the fields individually does not imply their joint isotropy. Two such fields can be spatially slipped versions of one another, but 
with slippage in a particular direction the fields may remain marginally isotropic but their joint structure will not. 



3.2 Correlation between the initial loads 

Statistically, eqs 045 1 > and J49b imply that the initial-loading topographies on the two interfaces are related spectrally as 

dU 2 (k) = r(k)^ SS cffli(k) + dUj(k) = p(k)dUi(k) + dUi(k), (52) 
y/Su(k) 

whereby T-L^ (x), the zero-mean orthogonal complement to Hi (x), is uncorrected with it at all lags. The interpretation of what should cause 
a possible "correlation" between the initial-l oading topogra phies must be geologic al (McGo vern et ali 2002; McKenzie 2003; Bellegui c et al.l 
l2005l : IWieczorekl2007tlkirbv & Swainl2009h . Erosion (e.g. lStephensonll984l ; lAharonson et al.l200ll) is typically amenable to the description 
articulated by eq. ((52}, though much work remains to be done in this area to make it apply to the most general of settings. Under isotropy of 
the loading, the implication is that the initial subsurface loading H2 (x) can be generated from the initial surface loading Hi(x) by a radially 
symmetric convolution operator p(x), 

%(x)= // p(x-x')Hi(x')dx'+^5 L (x). (53) 
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By selecting the initial loads Hj as the primary variables of the flexural estimation problem, and not the equilibrium Hij or final 
loads Hoj, we now have the correlation r between the initial loads to consider in the subsequent treatment. Geologically, this puts us in a 
bit of a quandary, since if eq. ( 152) holds, this can only mean that one loading process "follows the other in time", "reacting to it". However, 
the temporal dimension has not entered our discussion at all, and if it did, it would certainly make sense to choose the correlation between 
the equilibrium load on one and the initial load on the other interface as the one that matters. The linear relationship (144) between the loads 
renders these two viewpoints mathematically equivalent. Our definition of eq. ( 149) is chosen to be mathematically convenient because it is 
most in line with th e choices to be made in the next section. 

iForsvthl dl985l) deemed correlations between surface and subsurface loads to be potentially important but he did not make the deter- 
mination of the correlation coefficient ( 149) part of the estimation procedure for the flexural rigidity D, which was instead predicated on the 
assumption, his third by our count, that r(k) = 0. He did recommend computing the correlati on coefficient be tween the i nitial loads via 
eq. (|44), after the inve r sion for D, and us ing the results to aid with t he interpretation (see, e.g. JZuber et al.|[l"989l) . Studies bv lMacario et alj 
dl995l)JCrosbvl l l2007h . lWieczorekl d2007l) and lKirbv & Swain (2009) have since shed more light on how to do this more quantitatively, but to 
our knowledge no-one has actually attempted to determine the best-fitting correlation coefficient as part of an inversion for flexural rigidity. 



3.3 Proportionality between the initial loads 

IForsvthl (1985) introduced the 'loading fraction' as the subsurface-to-surface ratio of the power spectral densities of the initial-loading 
stresses I2 and Ti, and thus from eqs d39)-l l40) and ( 145 ) we can write 

a _ <<cb(k)cg;(k)) Aj5 22 (k) 

I() (dl^dlKk)} A?5 u (k)' 7 - P ' 

This definition is fairly consistently applied in the literature (e.g. [Banks et alj200lh . though iMcKenzid 120031 ") has pr eferred to parameterize 
by the fraction each of the loads contributes to the total, which is handy for situations with multiple interfaces (see Kirby & Swain 2009) 
and subsurface-only loading. Eq. d54) is a statement of pro portional i ty of t he power spectral densities of the initial loads, 52 and Si. With 
this constraint, which we identify as his fourth assumption. IForsvthl d 19851) was able to factor Sn out of the spectral matrix S in eq. {45}, 
which as we recall from the previous section, by his third assumption had no off-diagonal terms, to arrive at simplified expressions for <S of 
eq. J46fr . which acquires off-diagonal terms through eq. (147 l l. and ultimately for the admittance Q and coherence 7^ in eq. d48) . We revisit 
these quantities in the next section but conclude with the general form of the initial-loading spectral matrix that is implied by the definition 
of proportionality, which is 



S(k)=«Sn(k) 



1 r(k)/(k)A 1 A 2 " 1 
r'MfMA^ 1 / 2 (k)A 2 A 2 " 2 



(55) 



With what we have obtained so far: flexural isotropy of the lithosphere, Mo(fc), correlation of the initial-loading processes, r(k), and 
proportionality of the initial-loading processes, / (k), the spectral matrix ( 147) of the final topographies — those we measure — is given by 

So (k) = Su (k) To (k) = Su (k) [T(k) + AT(k)] , (56) 

where we have defined the auxiliary matrices 

T(u ,_f C 2 + / 2 (k)A?A 2 ~ 2 -A.A^-fj^AlA-^f A 2 V 

S -W — _A A"lt_f2^U3A-3^ A2 A-2 , f2.. x A 4 A -4 ,2 A , A- e ' ^ 1 > 



-AiA^-Z^AfAj 3 ^ A 2 A 2 - 2 + / 2 (k)A?A 2 -V J \ Ar + A 2 £ 

We define both T and AT so that we can easily revert to a model of zero correlation, in which case AT = 0. Note that we are silent about 
the dependence on wavenumber by using the shorthand notation £ and <j) for the lithospheric filters (123) and ( 125) , but have kept the full forms 
of the correlation coefficient r(k) and the loading ratio / 2 (k) to stress that they are in general functions of the wave vector as defined by 
eqs ( 149) and ( 154) . In general r will be complex and of magnitude smaller than or equal to unity, and f 2 (and /) will be real and positive. 



3.4 Admittance and coherence for proportional and correlated initial loads 

Via eqs d56b-ll58) we have explicit access to the (cross-)spectral densities between the individual elements in the final-topography vector dH a , 
as required to evaluate eq. (146) . We shall now consider those for the special case where both r(k) = r and / 2 (k) = / 2 are constants, no 
longer varying with the wave vector. Then, following eq. ( 148) . we obtain simple expressions for the admittance and coherence that we shall 
further specialize to a few end-member cases for comparison with those treated in the prior literature. We hereby complete TableQ]to which 
we again refer for a summary of the relevant notation. 

The Bouguer- topography admittance, for correlated and proportional initial loads with constant correlation r and proportion / 2 , is 

= -2,GA ie ^ * + f2A ^y 7/^^+ *] , (59) 
e + / 2 A 2 A 2 " 2 - 2r/A 1 Aj 1 £ 

Spectrally, this is a function of wavenumber, k, only, since the power spectra of the loading topographies, which both may vary (similarly, 
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because of their proportionality) with the wave vector k have been factored out. This admittance can be complex-valued since the load 
correlation may be, unless the power spectra of the loading topographies are isotropic. At k = the admittance yields the density contrast Ai. 
Assuming that the loads are uncorrelated but proportional simplifies the Bouguer- topography admittance to the familiar expression 

Q f (k) = ^GA^ l+ff^t . (60) 

In scenarios where only top or only bottom loading is present, we get the original expressions jTurcotte & Schubertlll982UForsvfhlll985h 

Qi(fc) = -2 7 rGAir 1 e fcZ2 , (61) 
Q 2 (k) = ^TrGA^e** 2 , (62) 
where, as expected and easily verified, 

IxmQf-tQi and lim Q f -> Q 2 . (63) 

f=0 / = oo 

The Bouguer-topography coherence, for correlated and proportional initial loads with constant correlation r and proportion f 2 , is 

2 (g + fAfA£^ - r/Ar A,- 1 + 1]) 2 

7o( ] ( 5 2 + / 2 A?A 2 - 2 ~2r/A 1 A 2 - 1 e) (l + / 2 A 2 A 2 - 2 ^-2r/A 1 A 2 -V)' ( } 



which, as the admittance, is a function of wavenumber k regardless of the power spectral densities of the loading topographies. Unlike the 

admittance it has lost the dependence on the depth to the second interface, z 2 , and it is always real, < j 2 < 1. 

When the initial loads are uncorrelated but proportional the Bouguer-topography coherence is, as according to lForsvthl dl985h . simply 

a _ (C + / 2 A 2 A 2 - 2 ^) 2 

7/W " (C 2 + / 2 A 2 A 2 2 ) (1 + / 2 A?A 2 -V)- (65) 
This expression was solved bv lSimons et al.l d2003l) for the wavenumber at which j 2 — 1/2, the diagnostic dSimons & van derHilstll2002l) 



\2Df 



1/4 

A 2 -/(A 1 +A 2 ) + / 2 A 1 + ^ 



(66) 



+ 2/ (Ai - Ai A 2 ) + f (A 2 + A.I + 4Ai A 2 ) - 2/ 3 (Ai A 2 - A?) + f 4 A'(. In the pa per bvlSimons et al.l d2003l) 



eq. «66b appears with a typo in the leading term, which was briefly the cause of some confusion in the literature (IKirbv & Sw ain 2008a. b). 

Fig.|2]displays the individual effects that varying flexural rigidity, loading fraction and load correlation have on the expected admittance 
and coherence curves. Regardless of the fact that much of the literature to this date has been concerned with the estimation of the admittance 
and coherence from the available data, and regardless of the justifiably large amount of attention devoted to the role of windowing and 
tapering to render these estimates spatially selective and spectrally free from excessive leakage; regardless, in summary, of any practicality to 
the actual methodology by which admittance and coherence are being estimated and how the behavior of their estimates affects the behavior 
of the estimated parameter of interest, the flexural rigidity, D, we show these curves to gain an appreciation of the complexity of the task 
at hand. No matter how well we may be able to recover the "true" admittance and coherence behavior, the issue remains that they need to 
be interpreted — inverted — for a model that ultimately needs, or can, return an estimate for D but also of the initial-loading fraction, / 2 , 
and also of the correlation coefficient, r. Each of these have distinct sensitivities but overlapping effects on the predicted behavior of the 
measurements: selecting one end-member model (top-loading or bottom-loading only, for example, or disregarding the very possibility of 
load correlation, or imposing a certain non- vanishing value on the loading fraction or load-correlation coefficient) remains but one choice 
open to alternatives, and constraining all three is a task that, thus far, nobody has successfully attempted. Fig.|2]serves as a visual reminder 
of the limitations of admittance- and coherence-based estimation. However much information these statistical summaries of the gravity and 
topography data contain, it is not easily accessible for navigation in the three-dimensional space of D, f 2 and r. 



3.5 Load correlation, proportionality and the standard model 

The expres sions in the pre vious section show how difficult it is to extract the model parameters D, f 2 and r individually from admittance or 
coherence. iForsvtril (fl985) argued that coherence depends on f 2 much more weakly than admittance, but what is important for the estimation 
problem is how the three parameters of interest vary together functionally: whether they occur in terms by themselves or as products, in 
which variations of powers, and so on. The geometry of the objective functions used to estimate the triplet of parameters, together with the 
distribution of any random quantities the objective functions contain, determine the properties of the esti mators. We ret urn to the question of 
identifiability after we have presented the new maximum-likelihood estimation method. For that matter. iForsvthl < fl~985h suggested ignoring 
the load correlation, setting r = 0, and finding an estimate for the flexural rigidity D using a constant initial guess for the loading fraction f 2 
and the coherence modeled as 7 2 in eq. d65t . and then using eqs d37| (, d39M40t and d54t to compute a wavenumber-dependent estimate of / 2 , 
which can then be plugged back into eq. d65t as a variable, and iterating this procedure to convergence. However, this allows for as many 
degrees of freedom as there are "data", thereby running the risk that an ill-fitting D can be reconciled with the data by adjustment with a very 
variable f 2 . It is unclear in this c ontext what "ill-fitting" or "very variable" should mean, and thus it is hard to think of objective criteria to 
accomplish this. Mc Kenziel 1200 3h showed misfit surfaces for the (free-air) admittance for varying D and varying f 2 held constant over all 
wavenumbers. These figures show prominent trade-offs, suggesting a profound lack of identifiability of D and f 2 with such a method. 
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Figure 2. Expected values of the admittance and coherence between Bouguer gravity anomalies and topography in two-interface models, derived in Section lT4l 
All models have identical density structures, Zi = km, 22 = 35 km, Ai = 2670 kgm - 3 and A2 = 630 kgm -3 , Young's and Poisson moduli 
E = 1.4 X 10 11 Pa and v = 0.25. (Left column) Admittance curves for top-only (J 2 = 0) and bottom-only (f 2 = 00) loading as a function of the effective 
elastic thickness, T e (top left); for mixed-loading models at constant T e = 40 km with varying loading fractions f 2 , but without load correlation (middle left); 
and for models at constant T e = 40 km and f 2 = 1 but with various load-correlation coefficients r (bottom left), as indicated in the legend. (Right column) 
Coherence curves for a fixed-loading scenario at constant f 2 = 1 but with various values for T e (top right); for constant T e = 40 km and varying values 
of f 2 , without correlation (middle right); and at constant f 2 = 1 and T e = 40 km but for varying load correlation r (bottom right), as annotated. 
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Even more importantly. iMcKenzid J2003h emphasized the possibility of non-zero correlations between the initial loads, deeming those 
prevalent in many areas of low-lying topography, on old portions of the continents: precisely where the discrepancy be tween es t imates 
for elas tic thickness derived from di fferent methods has been leading to so much controversy. As an alternative to the iForsvthl Jl985h 
method. iMcKenzie & Fairh ead 1 1997) suggested estimating D and f 2 from the free-air admittance in the wavenumber regime where surface 
topography and free-air gravity are most coherent. The rationale for this procedure is tha t there might be loading scenarios resulting in gravity 
anomalies but not (much) topogra phy, a situation not accounted for in the lForsvm1(ll985l) model that can, however, be described by initial-load 
correlation. iKirbv & S wain (2009J), most recently, discussed the differences between both approaches, only to conclude that neither estimates 
the complete triplet (D, f 2 , r) of parameters (rigidity, proportionality, correlation) without shortcuts. Once again the statistical understanding 
required to evaluate whether either of these techniques results in "good" estimators is lacking. 

That the cause of "internal loads without topographic expression" can indeed be attributed to correlation in the sense of l |49t can be 
readily demonstrated by considering what it takes for the final, observable, surface topography Hoi to vanish exactly. Solving eq. d36t or 
eq. d44t > and using eqs ((23) and J25t returns the conditions that the first and second initial topographies are related to each other as 

dH2(k)=i(k)dHi(k), (67) 

which, using eqs l !45t , J54t and d49t , implies the following equivalent relations between them: 

S aa (k) = e(*)5i a (k)=|(A)5ai(k)=€ 2 (*)5ii(k), f 2 (k) = A^ 2 A 2 2 f(k), r = 1. (68) 

This set of equations together with our model very str ongly constrain both fields. Thus, as noted bv lMcKenzij lT2003) and others after him 
(Crosby 20071: IWieczorekll2007L iKirbv & Swai n 2009), a situation of internal loading that results in no net final topography may arise when 
the initial-loading topographies are perfectly correlated, balancing one another according to eqs fl67)-d68l. We can find a more complete 
condition for this scenario by equating eqs ((67} and (15 2t . which returns an expression for the orthogonal complement dHi\ when this is 
required to vanish non-trivially we obtain the seemingly more general condition 

r(k)/(k) = ^e(fc), < r(k) < 1. (69) 
Requiring that the final surface topography have a vanishing variance Son, substituting eqs J56M58t into eq. i46l . we need to satisfy 

rCk)/(k) = gM^, 0<r(k)<l. (70) 



The correlation coefficients in eqs d69)-l|70) must be real-valued since all of the other quantities involved are. Both eq. l |69t and eq. d70| l 
should be equivalent, and together they imply eq. d68t . We are thus left to conclude that for the observable surface topography to vanish, 
the correlation between initial surface and subsurface loading must be perfect and positive, r = 1. Solving the quadratic equation ((70} for / 
yields real-valued results only when \r\ 2 — 1 > 0, thus r — 1 for positive but non-constant /, as expected. 

The above considerations have put perhaps unusually strong constraints on the spectral forms of the final topography Hoi (k) or 5 n (k). 
From eq. ((3} we learn that in doing so, the spatial-domain observables 'Hoi(x) can never be non-zero. On the other hand, an observed 
Hoi (x) could be zero over a restricted patch without its Fourier transform or its spectral density vanishing exactly everywhere. Alternatively, 
it can be very nearly zero, and this may also practically hamper approaches based on admittance or coherence which contain (estimates 
of) the term 5oii(k) in the denominator (see eq. |48t. When the observed topography becomes small, hi gher-order negle cted terms may 
become prominent. Furthermore, there may be mixtures of loads with and without topographic expression dMcKen zie 2003). Speaking quite 
generally, there will be areas with some correlation between the initial loads, and we should take this into account in the estimation. Either 
one of the load correlation or load fraction may vary with wavenumber. What emerges from this discussion is that the isotropic flexural 
rigidity D, the initial-load correlation r(k), and t he initial - load p roportionality / 2 (k) should all be part of the "standard model" of flexural 
studies. The last two concepts were introduced bv lForsvthldl985[) . even though he did not further discuss the case of non-zero correlation. 

As we wrote in the first paragraph in this section, Forsyth's first assumption was that the depth of compensation and the depth of loading 
in fact coincide. He writes that the assumption of collocation of these hypothetical interfaces and their precise location at depth in Earth may 
well be the largest contributor to uncertainty in the estimates for flexural strength, but also that there may be a priori, e.g. seismological, 
information to help constrain the depth z%. Thus, much like the density contrasts Ai and A2, we will not include the depth to the second 
interface z 2 as a quantity to be estimated directly. Rather, we will consider them known inputs to our own estimation procedure and evaluate 
their suitability after the fact by an analysis of the likelihood functions and of the distribution of the residuals. 



4 MAXIMUM-LIKELIHOOD THEORY 

Measurements of "gravity" and "topography", which we consider free from observational noise, can be interpreted as undulations, Hoi 
and H02, of the surface and one subsurface density interface, with density contrasts, Ai and A2, located at depths Z\ — at z 2 in Earth, 
respectively. Geology and "tectonics" produce initial topographic loads, Hi and Hi, on these previously undisturbed interfaces. These are 
treated as a zero-mean bivariate, stationary, random process vector, dfi, fully and most generally d escribed by a spectral matrix, <S(k), 
under the assumption that the higher-order moments of "H-(x) are not too prominent ( IBrillingerlll975l) . For this paper we assume isotropy 
of the loading process, <S = S(k). The lithosphere is modeled as a coupled set of differential equations, whose action is described by the 
spectral-domain matrix M_d, which depends on a single, scalar parameter of interest, the flexural rigidity D. Since our observations have 
experienced the linear mapping dH = M_d dH, their spectral matrix is S (k) = Mij(fc)<S(fc) Mj(fc), and the objective is to recover D, 
we are led to study S Q (k). This includes its off-diagonal terms, which depend on the correlation coefficient of the loads at either interface, 
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— 1 < r(k) < 1, recall r(k) £ R, and, under the assumption of proportionality of the initial-loading spectra, on a loading fraction, f 2 (k). 
As part of the estimation we will thus also recover information about the loading process <S. 

All previous studies in the geophysical context of lithospheric thickness determination have first estimated admittance and coherence, 
ratios of certain elements of <S whose estimators have joint distributions that have not been studied. These were then used in inversion for 
estimates of D whose statistics have remained unknown. In the remainder of this paper we construct a maximum-likelihood estimator sensu 
Whitt2 i 1953b . directly from the data "gravity" and "topography", and the "known" parameters Ai, As, and z-x. The unknowns are D, r 
and f 2 , and, as we shall see shortly, three more parameters by which we guarantee is otropy of the lo ading process S through a commonly 
utilize d functional form. That this is more ambitious than the original objectives by iForsvthl dl985l) and the modifications by McKenzie 
( 2003) is because the reduction of the data to admittance or coherence obliterates information that we are able to recover in some measure. 
We study the properties of the new estimators and derive the distributions of the residuals. When the procedure is applied to actual data, these 
should tell us where to adjust the assumptions used in designing the model. 



4.1 Choice of spectral parameterization, a 2 ,u, p 

In the above we have seen that the primary descriptor of what causes the observed behavior is the spectral matrix S(k) from which the initial 
interface-loading topographies are being generated. After the assumption of spectral proportionality of the loading at the two interfaces, the 
expressions for admittance and coherence no longer contain any information about this particular quantity, though of course the deviations 
of the observed admittance and coherence from the models discussed in Section [3^41 still might. However, this information is no longer in 
an easily accessible form. Furthermore, coherence and admittance are typically estimated non-parametrically: the infinitely many, or rather, 
2K = M x N dimensions of the data are reduced to a small number of wavenumbers at which they are being estimated, thus there is a loss of 
O(K) degrees of freedom. At the low frequencies, most tapering methods experience a further reduction in resolution, which is detrimental 
especially in estimating the value of thick lithospheres from relatively small data grids, as is well appreciated in the geophysi cal literature. 

H ere, w e will simply parameterize the in itial loading using a "red " mode l, thereby avoiding such a loss. We may consult lGoff & Jordanl 

( 1988. 1l989h . lCarpentier & Rov-Chowdhurvl d2007l) or lGneiting etall feOld) for such models. Here we do, however, make the very strong 
assumption of iso tropy. This is unlikely to be satisfied in real-world situations, as spectral-domain anisotropy is pa rt and parcel of all geo- 
logical processes dGoff et al.|[l99ll ; ICarpentier & Rov-Chowdhury||2009l ; ICarpentier et al.ll2009l: iGoff & Arbicll2010b . Relaxing the isotropic 
loading assumption introduces considerable extra complications. Our reluctance to handle anisotropic loading situations stems from the fact 
that their estimation might be confused statistically with a possible anisotropy in the lithospheric response: we can thus not easily study one 
without studying the other. 

At this point we collect the parameters that we wish to estimate into a vector. To begin with, the "lithospheric" parameters, flexural 
rigidity D, loading ratio / and load correlation r are 

6 L = [D f r] T . (71) 

We denote a generic element of this vector as 9l- F or the spectrum of the initial-load i ng topographies we choose the isotropic Matern spectral 
class, which has legitimacy in geophysical circles dGoff & Jordanlll988l ; ISteinll999l : lGuttorp & Gn eiting 2006). We specify 

= ./..»„ + * , (72) 



7r(7rp) 2l/ V 7r 2 p 2 j 
whose parameters we collect in the set 

6 S = [a 2 v p] T , (73) 

with generic element 6s- The third parameter, p, is distinct from the mass density, as will be clear from the context. The full set of parameters 
that we wish to estimate problem is contained in the vector 

6 = [Ol 0j] T = [D f r a 2 v pf , (74) 

whose general element we denote by 9. For future reference we define the parameter vector that omits all consideration of the correlation as 

= [D f a 2 v pf. (75) 

Fig.[3]shows a number of realizations of isotropic Matern processes with different spectral parameters. As can be seen the pa rameters a 2 
("variance") an d p ("range") impart an overall sense of scale to the distribution while v ("differentiability") affects its shape ( Stein] [l99^ ; 
|Paciorekll2007l) . 



4.2 The observation vectors, dH and H 

In Section [2] we introduced the standard statistical point of view on stationary processes dBrillingerlll975t [Percival & Walden 1993). We 
specified how this applies to a finite set of geophysical observations that can be defined in a two-layer system, which we revealed to be 
the various types of "topography" and "gravity", and which are mapped into one another by the differential equations describing "flexure". 
Subsequently, we introduced the matrix formalism that describes the connections between the various geophysical observables and the initial 
driving forces that produce them, which we used extensively in Section [3] to discuss the standard approach of determining the unknown 
parameters of the flexural differential equation a nd the relative importance and c orrelation of the loading processes acting across either layer 
interface, which are of geophysical interest (e.g. iForsvthll 19851: lMcKenzidl2003l) . To address the problem of how to properly estimate these 
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Figure 3. Synthetic "topographies" generated from the Matern spectral class with parameters cr 2 , v and p as indicated in the legends. (Top row) Power spectral 
densities as given by eq. \12\ . (Bottom row) Spatial realizations of Gaussian random processes with the power spectral densities as shown in the top row. 



unknowns and their distribution, we now return to the statistical formalism espoused in Section l2Tl in order to clarify how the "theorized" 
geophysical observables, i.e. the spectral processes describing the various kinds of topography dH(k) and gravity anomalies dQ(k) are being 
shaped into the "actual" observations. Those are the windowed Fourier transforms H(k) and G(k) of particular realizations of topography 
and gravity as we can calculate from finite spatial data sets 'H(x) and <5(x) measured in nature. In the spectral domain we continue to 
distinguish by the choice of font the theory (calligraphic) from what we can actually calculate (italicized). In the spatial domain, there is no 
need to define anything but H(x) or Q (x). 



4.2.1 In theory: infinite length and continuous 

We recall that the spectral matrix So (k), given by eq. l |56t , of the vector of final, observable, topographies dtio (k) defined in eqs i|43|l-l|47|l. 
is separable in the sought-after parameter vectors 9s and Ol by the factoring of the spectral density Sn(k) of the initial-loading topographies, 

5 (k) = 5u(fc)T (k) = Sn(fc) [T(k) + AT(k)] . (76) 

In writing eq. ((76) we emphasize the wavenumber-only dependence of the "spectral" matrix Sn(fc), which is isotropic, but keep the full 
wavevector dependence of the "lithospheric" matrices T(k) and AT(k) to make sure they have the same dimensions as the data. However, 
in the case of isotropic loading both T(k) and AT(k) will also only depend on wavenumber, and they will both be real. We thus rewrite 
eqs l|57M58t with the dependencies 4>{k), £(fc), r(k) and f 2 (k) implicit in this sense, 

T(k] .f ? 2 + / 2 A?A 2 - 2 -A 1 A a - 1 e-/ a AjA a - i V\ / A 2 V 



-A^^-Z'AfAj^ A 2 A 2 " 2 + / 2 A*A 2 - 4 2 )\A 1 +A 2 £ 
The Cholesky decomposition 

T (k) = L (k)Lj(k) (79) 



reverts to the Cholesky decomposition of T(k) when r = 0. Explicit expressions appear in Appendix |9.1| Because of the above relationships 
the transformed quantities 
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Figure 4. Synthetic data representing the standard model, identifying the initial, Hj , equilibrium, 'Hij , and final topographies, Hoj , emplaced on a lithosphere 
with flexural rigidity D. The initial loads were generated from the Matern spectral class with parameters <r 2 , p and u; they were negatively con-elated, 
r = —0.75, and the spectral proportionality was / 2 , as indicated in the legend. Also shown, by the black line, is the Bouguer gravity anomaly, Qo2- The 
density contrasts used were Ai = 2670 kgm -3 and A2 = 630 kgm -3 , respectively. All symbols are listed and explained in Table[T] 



dZ (k) =5r i 1/2 (fc)L- 1 (k)d'H (k) 

have a spectral matrix that is the 2x2 identity, 

{dZ (k)dZ? (k)) = Idkdk'<J(k,k'). 



(80) 



(81) 



4.2.2 In actuality: finite length and discretely sampled 

We now define the vector of Fourier-transformed observations, derived from the actual measurements in eq. (|5) and in (1131 . through eq. d35l l. 



Ho(k) = 



iMk) 
H o2 (k) 



With Wk (k) the Fourier transform of the applied window defined in eq. (|6), and by comparison with eqs (l9t-( 113t . the covariance 



{H (k)H^k')> 



W K (k - k")W"K(k' - k")5 (k") dk" w «S (k) 5(k, k') 



(82) 



(83) 



In comparison to eq. i46\ and eqs ((56) or I l76t , the finite observation window introduces spectral blurring, the loss of separability of the spec- 
tral and lithospheric portions, and small correlations between wave vectors. These we ignored when writing the last, approximate equality, 
introducing the blurred quantity (for a specific window wk , as opposed to eas llOifTTI where we first used the overbar notation) 



5 (k)= // |Wic(k-k')| 5 (k')dk'. 



We denote the Cholesky decomposition of So as 

5 (k) = L (k)Lj(k), 

such that the transformed variable 

Z (k) = L" 1 (k)H (k) 

has unit variance 



(84) 



(85) 



(86) 



(Z (k)Z?(k)> 



(87) 
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4.2.3 In simulations: how to go from the continuous to the discrete formulation 



Correctly generating a data set H that is a realization from a theoretical spectral process dti with the prescribed spectral density <S 
requires ensuring that when we observe a finite samp l e of it, and we form the (tapered) periodogram of this, we get the corr ectly blurred 
spectral density jPercival|[l992l: IChan & Wood! 1 19991 : [Dietrich & Newsamlll993l 1 19971 ; lThomsorJl200ll : iGneiting et ai]|2006h in our case 
eq. d84t . Stability considerations require that should we simulate data on one discrete grid and then extract a portion on another discrete 
grid, we replicate the correct covariance structure everywhere in space and always produce the correct blurring upon analysis. Failure to 
acknowledge the grid properly at the simulation stage can lead to se verely compromised results as will be rea dily experienced but has not 
always been consciously acknowledged in the (geophysical) literature (Peitgen & Saupe 1988; Robin et al. 1993). The method that we outline 
here is variously known as lDavies & Hartd(ll987l) or circulant embedding dWood & Chan|[l994l ; ICraig mile 2003). 

Let us assume that we have a spatial grid x as in eq. I0, and a half-plane Fourier grid k as in eq. l !14b . On the K entries of the latter we 
generate (complex proper) Gaussian variables Z (k) and then transform these as suggested by eqs (186t - d87K 

H (k) = Lo(k)Z (k), (88) 
whereby L is the Cholesky decomposition expressed on the grid k, of eq. d84t calculated on a much finer grid k'. In other words, 



(k) = chol conv||F(k')| 2 ,5 (k')| = chol JJ |F(k - k')| 2 5 D (k') dk' 



= chol[S (k)] 



(89) 



whereby |F(k)| 2 is the unmodified periodogram of the spatial boxcar function that defines the simulation grid. The convolution in eq. d89> is 
to be implemented numerically, with care taken to preserve the positive-definiteness of the result. We now define the discrete inverse Fourier 
transform of this particular set of variables for this fixed set of wave vectors k to be equal to the integral that we introduced in eq. ((3}, 



e ikx d?io(k) 



Eik-x tt 

k 



(k), 



(90) 



which holds, in fact, for any x G R 2 , and is consistent with eq. ([5} which holds for the area of interest picked out by the boxcar window. We 
generate synthetic data sets 'Ho(x) via eqs J88M90t: by this procedure the covariance between any two points x and x' in any portion of 
space identified as our region of interest is now determined to be 



<7*„(x)7#(x')> = 



£k-(x— x ) 



5 o (k)dk = C (x-x') w V k ' (x 



k 



<So(k), 



(91) 



which follows from eqs ( 190) , d46t and d83t with the small correlations between wave vectors neglected, and using the notation introduced in 
eq. d5U . Now eq. d9U is equal to the universal expression in eq. l[8}, consistent with eqs d!0 M12t . and since the dependence is only on the 
separation x — x', stationarity is guaranteed. With x = x' eq. ( 19 1 1 states Parseval's theorem: at every point in space the variance of H a is 
equal to all of its spectral energy. Of course in the isotropic case considered here, Co (x — x' ) = Co ( 1 1 x — x' 1 1 ) , depending only on distance. 

Should we now take the finite windowed Fourier transform of such synthetically generated spatial data H a (x) on a different spatial patch 
(e.g. a subportion from the master set), while using any arbitrary window or taper w K i (x), we will be seeing the correctly blurred version of 
the theoretical spectral density <S , as required to ensure stability. Indeed, when forming a new set of modified Fourier coefficients Hj,(k), 
distinguished by a prime, 



H' (k) =J2w K ,(x)<Ha(x)e 



(92) 



their covariance now must be, as follows directly from eqs ( 1921 ), ( 19 1 b and {6^, the blurred quantity 



(HUk)H' H (k')) 



/\ -ik -x 



;«o(x)tt2V)) 



(93) 



^Wjf'(x)e" 



i(k-k")-x V"^ 



i(k'-k")-: 



'<S (k")dk" 



(94) 



W K ,{k - k")W£/(k' - k")5 (k") dk", 



(95) 



which is exactly as we have wanted it to be c onsistent wit h eq. ( 183b . We will continue to neglect the small correlations between wave vectors, 
but fortunately this will have limited impact ( Varin 2008; Varin et al. 201 1). 

Fig. E] shows a realization of a simulation produced with the method just described. In contrast to Fig. [TJwe now show the result of 
the case where the initial-loading topographies are indeed (negatively) correlated. Evidence for the loading correlation is not apparent to the 
naked eye. 



4.3 The log-likelihood function, £ 

Conditioned upon higher-order moments of the space-domain data being finite (Brilli ngerll975h . their Fourier components are near-Gaussian 
distributed, and for stationary processes, there are no correlations between the real and imaginar y parts of the Fourier transform, whic h are 
independent. Writing M for the Gaussian and Af c for the proper complex Gaussian distributions (Miller 1969 ; Neeser & Massev 1993), and 
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dropping more wave vector dependencies as arguments than before, the observation vectors H (k) in eq. d82b and the rescaled Z (k) of 
eq. d86b are thus characterized at each wave vector k by the probability density functions 

= Af c (0,S o ), pz = Af°(0, 1) , TZe{Z } ~A/"(o, ±l) , and Tm{Z } ~ Af(o, ±l) . 



(96) 



As we have noted at the end of Section |2T| at the Nyquist and zero wave numbers these quantities are real with unit variance. In so writing 
the observation vector is treated as a random variable, but we are interested in the likelihood of observing the particular data set at hand given 
the model, which for us means an evaluation at the data in function of the deterministic parameters a 2 , p, v, D, f 2 , r. This quantity, £(0), 
receives contributions from ea ch wave vector k that, once the number K of considered wave vectors is large enough, can be considered 
independent from one another dDzhamparidze & Y aglom 1983). The log-likelihood is thus, up to a constant, given by the standard form 



m = - 



inn 



exp(- 



-H^Sq 1 Ho 



det So 



k 



K ^ 

k 



(97) 



While we know that there is in fact correlation between the terms £k(0), only at very small sample sizes K will this produce inefficient 
estimators, as the accrued effects of the correlation diminish in importance with increasing sample sizes. At moderate to large sample sizes 
there is considerable gain in computational efficiency and no loss of statistical efficiency due to the fast spectral decay of the blurring kernel 
functions involved. Our objective function, the log-likelihood, remains simply the average of the contributions at each wave vector in the half 
plane. Of course eqs d96M97> contain the blurred spectral forms <S (k) that we defined in eq. d84t . in acknowledgment of the fact that the 
variance experiences the influence from nearby wave vectors: the approximation made asymptotically is that of eq. d83t . but eq. d84t is exact. 

While we cannot ignore this blurring for finite sample size and for the particular da ta tapers used to obtain the windowed Fourier trans- 
forms, for very large d ata sets and wel l-designed, fast-decaying, window functions (e.g. ISimons&Wan3l201lh the observation vectors H 
will converge 'in law' I IFergusonll 19961) to random variables H D that are distributed as complex proper Gaussian with an unblurred variance, 



Ho 



H' ~Af c (0,S o 



in which case we would simply write 
Ph = A/" c (0,So). 



(98) 



(99) 



Working with this distribution is mathematically more convenient since all of the subsequent calculations can be done analytically, and, per 
eq. d76b . separably in the lithospheric and spectral parameters, so we will adhere to it until further notice. In this case the log-likelihood is 



C(0) 



1 

K 



In!] 



exp(-H 1 5o" 1 Ho 



det So 



K ^ 

k 



[21nSu +ln(detTo) + S 1 i 1 Ho 1 To 1 Ho] = -^^£ k (0) 



(100) 



While algorithms for simulation and data analysis will be based on eq. d97b , we will use eq. dlOOb to study the properties of the solution, 
ultimately (in Section[6]and Appendix |9.8b demonstrating why such an approach is justified. On par with eq. dlOO) we introduce an equivalent 
likelihood in whose formulation the correlation coefficient r does not appear, with the notation of eqs j74t-(l75t" and eqs f7gM78l. namely 



C{9) = [ 21n5 n +ln(detT)+Sri 1 H 1 T- 1 Ho] 

k 



(101) 



4.4 The maximum-likelihood estimator, 



The gradient of the log-likelihood, the score function, is the vector 

T 

;;/■ /)/' ;)f ;>i' nr nr 



3D 



(102) 



(103) 



dC dC dC dC dC 
df 2 dr da 2 dv dp 

with generic elements, never to be confused with the coherence functions d64t-d65t, that we shall denote as 

k k 

Following standard theory l|Pawitanll200"lL lDavisonll2003h we define the maximum-likelihood estimate as that which maximizes C(0), thus 
is the vector of the maximum-likelihood estimate of the parameters, for which 

7(0) = 0. (104) 

Contingent upon the requisite second order conditions being satisfied dSeverinil200ll) . this is also assumed to be the global maximum of dlOOt 
in the range of parameters that is allowed to take. We now let 0o be the vector containing the true, unknown values, and have a certain 0' 
lie somewhere inside a ball of radius ||0 — 0o || around it. Then we may expand the score with a multivariate Taylor series expansion, using 
the Lagrange form of the remainder, to arrive at the exact expression 

7(0) = 7 (0o) + F(0')(0-0o), for ||0' - 0„|| < ||0 - O ||. (105) 



The random matrix F is the Hessian of the log-likelihood function, with elements defined by 
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_ d-y e , _ d 2 C 

Fee ' -~df- dew' (106) 

and an expected value —T, the Fisher 'information matrix', 

F{0) = -(F(0)>, withelements T w = -(^gg;)- ( 107 ) 
Hence the name 'observed Fisher matrix' which is sometimes used for the Hessian. If it is invertible we may rearrange eq. (1105b and write 
= e -F- 1 (0')y(0 ). (108) 
For this exponential family of distributions the random Hessian converges 'in probability' to the constant Fisher matrix 

F(0) -A -F{0 ). (109) 

This is more than a statement about means: the fluctuations of F about its expected value also become smaller and smaller. Thus, no 
matter where we evaluate the Hessian, at 0' or at Go, both tend to the constant matrix T ' . The distributional properties of the maximum- 
likelihood estimator c an be deduced from eqs dl08b -( fTo"9l . which are also the basis for Newton-Raphson iterative numerical schemes (e.g. 
iDahlen & Simonsl2008h . We thus need to study the behavior of 7, F, and T '. The symbols of the statistical apparatus that we have assembled 
so far are listed in Table [2] 



4.5 The score function, 7 

Per eqs d 1 02 1 — dT04t the derivatives of the log-likelihood function C vanish at the maximum-likelihood estimate 9. With our representation 
of the unknowns of our problem by the parameter sets Gl and 6s we are in the position to calculate the elements of the score function 7 
explicitly. We remind the reader that these are not for use in the optimization using real data sets where the blurred likelihood C is to be 
maximized instead. In that case the scores of C will need to be calculated numerically. However, the scores of the unblurred likelihood C that 
we present here will prove to be useful in the calculation of the variance of the maximum-blurred-likelihood estimator. Combining eqs 1 11 00b 
through d!03b we see that the general form of the elements of the score function will be given by 

16 = j^^2'ye(k) = [2rng(k) + S^n^AgHo] . (110) 

k k 

For the lithospheric and spectral parameters, respectively, we will have 
l ain(detTo) 0T- 1 

m % (k) = 5n^, A fls = -m % (k)T-\ (112) 

The explicit expressions can be found in Appendices 19. 2li931 For completeness we note here that dS^ /86s = —mg s S^. 

To determine the sampling properties of the maximum-likelihood estimation procedure we use eqs fl99)-( [T03"} to make the identifications 

£k = mp Ho and 7e (k) = — %2., (113) 

Ph 86 

to obtain the standard result that the expectation of the score over multiple hypothetical realizations of the observation vector vanishes, as 

f dPH \ ,„ dff ,„ \ 8 



( 7e (k)) = J 7e (k)p Ho dH = J {-^) fflo = de[J PH = dHo J = dei 1 ) = °- (114) 

In the treatment that is to follow lljohnson&Kotzll973h . we will need to perform operations on multiple similar forms as in eq. dl 10b . namely 

7«(k) = -2m e (k) -Sr/H^AoHo. (115) 

To facilitate the development for the second term in eq. ( II 15b we use eq. {88}, but again without the complications of spectral blurring, see 
eq. ( |80b , and proceed by eigenvalue decomposition of the symmetric matrices L^A^Lo to yield 

S^H^AeHo = Zj i (LjA e Lo)Zo = Zj i (P^A e P e )Zo = (P 9 Zo) H A e (P 9 Zo) = ZrA e Z e (116) 

= A+(k)|i+(k)| 2 + A-(k)|4-(k)| 2 , (117) 

where A^" (k) and A^ (k) are the two possibly degenerate eigenvalues of L^Ae L constructed by combining eqs ((79} and dill t — dTT2~t . 
A± =eig(LjA e L ) . (118) 



Since the matrix Pg is orthonormal, Zg and Zg are identically distributed and thus we find through eq. ( 1961 that eq. dl 17b is a weighted sum 
of independent random variables, each exponentially distributed, xl/2, with unit mean and variance. In summary, we have the convenient 
form for the contributions to the score ( 11 10> from each individual wave vector, 



7 fl(k) = -2m fl (k)-Aj(k)|i e + (k)| -Ae(k)IJ e -(k)| . (119) 
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Since nig is nonrandom we thus have an expectation for the contributions to the score that confirms eq. ( II 14b . namely 

< 7 e(k)) = -2m B (k) - A+(k) - A fl -(k) = 0, 
and a variance given by 

< 7e (k) 7e (k)> = [A+(k)] 2 + [A e -(k)] 2 = var{ 7fl (k)}. 
We also retain the useful expression 

(Sf/H^AeHo) = tr(Lj A9L0) = A^(k) + A^ (k) = -2m e (k). 



(120) 



(121) 



(122) 



Eq. ( 1 1 2 1 1 > gave us the variance of the derivatives of the log-likelihood function with respect to the parameters of interest, which was written in 
terms of the eigenvalues of the non-random matrix Ag Lo. More specifically, for the variances of the scores in the lithospheric parameters 
9 L in L = [D f r] T , we will find 



r{7<fc(k)}=[A£(k)] 3 +[A^(k)] : 



(123) 



whereas for the variances of the scores in any of the three spectral parameters 9s in 0,s = [u v p ] , judging from eq. (Ill 2b . we will need 
the sum of the squared eigenvalues of —m$ s L^T^ 1 L and since L Q is the Cholesky decomposition of T , we have T^ 1 = LJ T L^" 1 and 



var{7e s (k)} = 2m|,(k). 

As to the covariance of the scores in the different parameters we use eqs d 1 13b — (TTX4t to write 



7e' (k) pn a dH c 





= / ^[7e'(k)pH ]dH = 



pn dU + / [7fl(k)7 9 /(k)]p Ho rfH 



(124) 



(125) 



(126) 



and thereby manage to equate the variance of the score to the expectation of the negative of its derivative, 

<7a(k) 7 *'(k)> = -^ 7e ,(k)) = -(H^) = cov{ 7e (k),7 e '(k)}, 

which should of course specialize to verify eq. ( 1121b . giving us two calculation methods for the variance terms. We do not consider any 
covariance between the scores at non-equal wave vectors. 

From eqs dl 10b and d 1 1 9b we have learned that the full score ye is a sum of random variables jg (k) or indeed the j^^(k)| 2 , which 
belong to the exponential family. Between those we consider no correlations at different wave vector s, and eqs (11201 and d 1 26b have given us 
their mean and covariance, respectively. Lindeberg-Feller central limit theorems apply dFellerll 1968b . and so the distribution of the score ye 
will be Gaussian with mean zero and covariance 



{7e,7e'} = 4^ ^2cov{-yg(]<.),-ye'0i.)}. 



K 2 

k 

Using eqs ( 1 126b , i ll 00b and d 1 06b — ( fTOTt we can rewrite the above expression in terms of the diagonal elements of the Fisher matrix, 

/ d 2 c 



Axov{ 7e ,7 9 /} 



{Fge> } — Tei 



(127) 



(128) 



\ 8909' 

We can summarize all of the above by stating that, for K sufficiently large, ignoring wave vector correlations, and through the Lindeberg- 
Feller central limit theorem, the vector with the scores in the individual parameters converges in law to what is distributed as 

s/Ki(0) ~ Af(O,F(0)). (129) 



4.6 The Fisher information matrix, T 

From the definition in eq. d 107b we have that the elements of the Fisher matrix T are given by the negative expectation of the elements of the 
Hessian matrix F, which themselves are the second derivatives of the log-likelihood function £ with respect to the parameters of interest 0. 
Per eq. d 128b the Fisher matrix scales to the covariance of the score 7, and by combining eqs dl23b -l fl"2"4"t with eq. dl 10b or, ultimately, 
eqs J 1 2 1 b and d!27b , we thus find a convenient expression for the diagonal elements of the Fisher matrix, namely 



•^ = ^E var ^ k » = -FE{ [ A 9( k )] 2 + [^(k)] 2 }, 

k 



K ^ 

k 



which, for the spectral parameters specializes to the more easily calculated expression 
T 6s 6 s = ^y^me s Qs). 



(130) 



(131) 



For the cross terms, rather than combining eqs (11 191 and (1127b . we proceed via eq. (1128b and thus require expressions for the elements 
of the Hessian. From eqs d!06b and d 1 10b we derive that the general expression for the elements of the symmetric Hessian matrix are 



Foal — 



die' _ _ J_ 
39 K 2-^ 



(132) 
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description 


eq. 




the true, unknown, parameter set of the problem, consisting of lithospheric and spectral parameters 


JTOTI 


§ 


f n m q yi m l ivn —1 1 k pm i n j~ir\/~i pctimntp nf thf t~\ n t" n m t^tt^r cp*t" 

LUC 1 1 lil A 1 1 1 1 U 1 1 1 "llJvCllllUULl CaLlllldlC *J1 U1C YjtXx CllllClCl SCL 




0,8' 


generic occurrences of the parameter set 




Ol 


the lithospheric parameter set of the estimation procedure, containing D, f ' 2 and r listed in Table[JJ 


ED 


9s 


the spectral parameter set of the estimation procedure, containing a 2 , p and u listed below 


ED 


§ 


the parameter set not including the correlation coefficient r 


CS) 


dUo 


the "theoretical" observation vector, containing final topographies Hoi and Wo2 at both interfaces 


E3 


dn 


the theoretical vector containing initial topographies Hn and H22 at both interfaces 


ED 


s 


the spectral matrix containing the (cross-)spectral densities of the theoretical initial topographies 


ED 


So 


the spectral matrix containing the (cross-)spectral densities of the theoretical final topographies 


ED 


M D 


the matrix that maps the initial-loading spectral matrix S to the final-observed spectral matrix <S 


E3 


Sii 


the power spectral density of the top-loading process, here assumed to be isotropic 


E3 


a 2 


the first quantity in the parameterized Matern form of the spectral density <Sn , to be estimated 


E3 


P 


the second quantity in the parameterized Matern form of the spectral density <Sn, to be estimated 


ED 


V 


the third quantity in the parameterized Matern form of the spectral density <Sn, to be estimated 


1721 


T c 


the "spectral" matrix after factoring the power spectrum of the top-loading process, <Sn , out of So 




T 


the part of To that is independent of the correlation coefficient r between the loads 


nu 




the part of X that depends on the correlation coefficient r between the loads 


J701 


T 


a lower-triangular matrix forming the Cholesky decomposition of To 




Ho 


the "observed" observation vector, containing final topography Hoi and H a 2 at both interfaces 


ED 


So 


the "blurred" spectral matrix, containing the (cross-)spectral densities of the actual final topographies 


ED 


L c 


a lower-triangular matrix forming the Cholesky decomposition of So 


ED 


£(0) 


the likelihood of observing Bouguer gravity and topography under the two-layer flexural model 


ED 


K 


total number of all wave vectors considered, covering the upper half-plane of spectral space 


ED 


k,k' 


generic wavenumbers from the wave vectors k, k' 


ED 


C(0) 


the likelihood of observing Bouguer gravity and topography neglecting spectral blurring 


ill OOt 


C{0) 


the likelihood of observing Bouguer gravity and topography neglecting spectral blurring and load correlation 


(Hon 


la 


an element of the gradient of the likelihood £, or the score function, ~y 


EHD 


Fee 1 


an element of the Hessian of the likelihood C, or the observed Fisher matrix, F 


EH) 


Fee' 


an element of the negative expectation of the Hessian, or the Fisher information matrix, 


ESD 


Jee 1 


an element of the inverse of the Fisher information matrix, {J 


EH} 


X 


quadratic residual surface obtained after maximizing the likelihood 


EiD 


S 


generic isotropic Matern spectral density for univariate fields 


EqD 


Cs 


the likelihood of observing univariate data under the isotropic Matern model 


EHD 


is 


the score of the likelihood C$ 


(209} 


Fs 


the Fisher matrix of the likelihood Cs 


EE} 


X 


maximum-log-likelihood ratio test statistic to evaluate the need for initial-loading correlation 


E2D 



Table 2. Some of the symbols used for the statistical theory presented in this paper, their short description, and equation numbers for context. 



Unless we use it in the numerical optimization of the log-likelihood we only need the negative expectation of eq. J132t . the Fisher matrix 



dm e ,(k) 



+ 2 Sii 



-1&S1 



Illy 



(133) 



where we have used eq. d!22t . Of course, when 8 — 8', the general eq. dl33t specializes to the special case d 1 301 > discussed before. Ultimately 
this equivalence is a consequence of eq. dl26t which held that in expectation, the product of first derivatives of the log-likelihood is equal to 
its second derivative. 

The explicit forms are listed in Appendix 19.41 but looking ahead, we will point to two special cases that result in simplified expressions. 
It should be clear from the separation of lithospheric and spectral parameters achieved in eq. j76) and from eqs (1111 1 — (1 1 12t that the mixed 
derivatives of one lithospheric and one spectral parameter, d$ L me s ~ de s me L = and de L Sn = 0, both vanish, and that we thereby have 



2 ^ 

Sr/H^A^Ho) m fls (k), Te L e s = ^ m e Jk)m % (k) 



(134) 



Finally, we also easily deduce that 



Fa 



-y 

k 



dmedk) 

) ° 

J d8 s 



+ i rng {k)m e , (k) 



d8 s 



(Sn H D T 'Ho) 



2 ^ 

k 
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where we have used the previously noted special case of eq. C22t bv which (Sf/H^T^Hc) = tr (LjT~ 1 Lo) = 2. The previously 
encountered eq. dl31t is again a special case of eq. {[35} when S = 0' s . Both expressions d 1 34b and d 13 5 b are of an appealing symmetry. 
Between them they cover the majority of the elements of the Fisher matrix, which will thus be relatively easy to compute. 



4.7 Properties of the maximum-likelihood estimate, 

We are now ready to derive the properties of the maximum-likelihood estimate given in eq. d!08t , which we repeat here, as 

= 6» o -F- 1 (6>')7(fo). (136) 

From eq. d!29t we know that the score 7 converges to a multivariate Gaussian, and from eq. ( 1109) we know that the Hess ian F converges 
in pro bability to the Fisher matri x J 7 . A Taylor expansion allow s us to replace 0' by 0o as in standard statistical practice dCox & Hinklev 
fl974h . Thus, by Slutsky's lemma ( Severi m1l200ll ; lDavison1l2003l) the distribution of is also a multivariate Gaussian. Its expectation will be 

(0) = 0o, (137) 
showing how our maximum-likelihood estimator is unbiased. Its covariance is 

cov{0} = T-\6 ) cov{ 7 (0 o )} T- T {6 ). (138) 

From eq. d 128b we retain that Kcov{~f(6o)} — J-{0q) and with T — a symmetric matrix, we conclude that the covariance of the 
maximum-likelihood estimator is given by 

Kcov{§} =^ _1 (0 O ), orindeed Kcov{§, §'} = J w (0 O ), where J{0 Q ) = ^ _1 (0 O ). (139) 
In summary, we have shown that 

VK{§ - 6>o ) ~ JV(0, T-\9 )) = 7V(0, J(0 )), (140) 

which allows us to construct confidence intervals on the parameter vector 0. Denoting the generic diagonal element of the inverse of the 
Fisher matrix evaluated at the truth 0o as Jee (0o)> this equation shows us that each element of the parameter vector is distributed as 

h - 6o) ~ Af(0, 1), (141) 



As customary, we shall replace the needed values O with the estimates and quote the lOOxa % confidence interval on 8q as given by 

a _ [el W <r a ^ a , „ {[el (^) nA ^ 

o — z a/2 7= — < t>0 < C + Z a /2 7= — , (142) 



K ~ ~ VK 

where z a is the value at which the standard normal reaches a cumulative probability of 1 — a, i.e. z a / 2 ~ 1.96 for a 95% confidence interval. 

These conclusions, which are exact for the case under consideration, will hold asymptotically when in practice we use the blurred 
likelihood J97t instead of eq. JlOOt . In the blurred case and for all numerical optimization procedures, we expect to have to amend eqs d 1 37b 
and d 1 3 8b by correction factors on the order of K^ 1 and K~ 2 , respectively. Eq. d!42t would receive extra correction terms starting with the 
order K , which would be immaterial given the size of the confidence interval. 

In some sense, eq. ( 11421 ) concludes the analysis of our maximum-likelihood solution to the problem of flexural-rigidity estimation. It 
makes the important statement that each of the estimates of flexural rigidity D, initial-loading ratio f 2 , and load correlation coefficient r, 
will be normally distributed variables centered on the true values and with a standard deviation which will scale with the inverse square-root 
of the physical data size K. Obtaining the variance on th e estimates of effective elastic thickness T e from the estimates of D will be made 
through eq. ( 12 1 1 via the "delta method" jDavisonll2003h . This implies that the estimate of the effective elastic thickness is approximately 
distributed as 

f e ^Af^s 1/s D 1 /3 ,^s 2/3 Do 4/:i ynr{D}j where s = 12(1 - is 2 )/E. (143) 
4.8 Analysis of residuals 

Once the estimate = [Ol §s] T has been found, we may combine it with our observations, and through eq. d86t , form the variable 
Zo(k) = L- 1 (k)|^H (k), (144) 

which should be distributed as the standard complex proper Gaussian A/" c (0, 1). Equivalently, and as a special case of eqs f97t and fiTTl . 
X (k) = Z?(k)Z (k) = H^-^Ho ~ X4/2, (145) 
and these variables should be approximately independent. We can rank order them according to their size, 

= min{A:o(k)} < X^ 2) < ... < X { K) = max{A'o(k)}, (146) 
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Figure 5. The behavior of the quadratic residuals Xq (k) defined in eq. in a recovery simulation for correlated loading. (Left column) Observed 

(histogram) and theoretical xl/2 distribution (black curve) of the residuals Xo(k) across all wave vectors k. (Middle column) Quantile-quantile plot of 
the observed Xo(k) compared to their theoretical xl/2 distribution across all wave vectors k. (Right column) Plot of the observed residuals Xrj(k) in the 
wave vector plane. The examples are for a case where K = 2 X 32 X 32, Ai = 2670 kgm -3 , A2 = 630 kgm -3 , 22 = 35 km, and the sampling intervals 
were 20 km in each direction. The true model is for the correlated case where the lithospheric parameters are D = 1 X 10 24 Nm, f 2 = 0.8 and r = 0.75, 
and the spectral parameters <r 2 = 2.5 X 10 -3 , v = 2, p = 4 X 10 4 . (Top row) A "bad" example where the residuals do not follow the predicted distribution 
and continue to show too much structure in the wave vector domain. For this example, the poor estimate is given by D = 1.5 X 10 24 Nm, / 2 = 0.915 
and f = 0.656, <r 2 = 1.9 X 10 — 3 , v = 1.5, p = 4.25 X 10 4 , and the blurred log-likelihood C = —18.591. (Bottom row) A "good" example which 
indicates that the estimate will be accepted as a fair representation of the truth, which in this case is D = 1.326 X 10 24 Nm, f 2 = 0.790 and r = 0.741, 
a 2 = 2.415 X 10~ 3 , v = 2.00, p = 3.974 X 10 4 . The blurred log-likelihood C = -18.2883. No structure is detected in the residuals: the model fits. 



and inspect the quantile-quantile plot toavisonl [2003h whereby the for all j — 1, . . . , K, are plotted versus the inverse cumulative 

density function of the xl/2 distribution, evaluated at the argument j/(K + 1). If, apart from at very low and very high values of j, this 
graph follows a one-t o-one line, ther e will be no reason to assume that our model is bad for the data. This can then further be formalized 
by a chi-squared test ( Davison 2003), but a plot of the residuals as a function of wave vector will be more informative to determine how 
the model is misfitting the data. In particular it may diagnose anisotropy of some form, or identify particular regions of spectral space that 
poorly conform to the model and for which the latter may need to be revised. Fig.|5]illustrates this procedure on a recovery simulation under 
correlated loading. 

If the method holds up to scrutiny of this type, then because ours is a maximum-likelihood estimator, it will be asymptotically efficient, 
with a mean-squared error that will be as small or smaller than that of all other possible estimators, converging to the optimal estimate as the 
sample size grows to infinity. 



4.9 Admittance and coherence return, briefly 

The t heoretical admittance Q and coherence 7o are nothing but one-to-one functions of our parameters of interest. Consequently jDavisonl 
2003), maximum-likelihood estimates for either Q or 7 2 are obtained simply by evaluating the functions d59t or ( I64t at the maximum- 
likelihood estimate of the parameters. The equivalence is easy to appreciate by expanding the score in the desired function, e.g. 7 2 , as a total 
derivative involving the parameters D, f 2 and r, 

dC_ _ OC dD_ + dC_ df_ + d£dr_ 

c*7 2 dD d-y 2 df 2 dj 2 dr d-y 2 ' 

The score in 7 2 vanishes when dC/dD = dC/df 2 = dC/dr — as long as each of d^j 2 /dD, dy 2 /df 2 and dj 2 /dr are non-zero. 
Thus the maximum-likelihood estimates Q a and 7 2 are obtained at the maximum-likelihood values D, f 2 and r , and are computed without 
difficulty, as we will illustrate shortly. See Appendix |9.5| for a few additional considerations. 
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5 TESTING THE MODEL 

In the previous section we discussed the question whether the "model" to which we have subscribed is at all "valid" in very general terms. 
Here, we will address two possible concerns more specifically. The main ingredients of our model are the flexural equations d20t . corre- 
lation l |49) and proportionality ( 154) of the initial topographies, and the isotropic spectral form ( 172) that we assumed for the loading terms. 
Other than that, we have introduced a certain fixed two-layer density structure Ai, A2 and z%, and an approximate way of computing gravity 
anomalies by way of eq. i26\ . When working within this framework, we showed in Section l4~8l how to assess the quality of the data fit, 
and in Section |4~9l how to hindcast the traditional observables of admittance and coherence. However, what we have not addressed is the 
relative merits of alternative models. How appropriate is the Matern class, especially in its isotropic form? How different would an analysis 
that does not consider correlated loading be from one that does? What would be the effect of modifying or adding additional terms to the 
flexural equations, as could be a ppropriate to consider more complex tectonic scenarios, elastic non-linearities, elastic anisotropy , or alterna- 
tive rh eologies (as, for example, Step henson & Beaumondl980l : IStephenson & LambeclJl98llRibell982l ls warn &Kirbvl20 03a; McK enzid 
l20ld) ? We cannot, of course, address all of these questions with any hope for completeness, but in this section we introduce two specific 
considerations that will speak to these issues. 

The first, detailed in Appendix 19.61 involves a stand-alone methodology to recover the spectral parameters in the Matern form given 
univariate multi-dimensional data. This will help us build well-suited data synthetics; it wil l also enable the study of terrestrial and pla netary 
surfaces per se, e.g. to measure the roughness of the o cean floor or the lunar surface (e. g. iGoff & ArbicfeoiOtlRosenburg et al.llioi 1). Even 
more broadly, it is an approach to characterize texture (Har alickll979l : lc"ohen et alj 199lh in the context of geology and geophysics. Although 
our chosen parameterization 1 1721 permits a wide variety of spectral shapes, we are of course limiting ourselves by only considering isotropic 
loading models. In future work, anisotropic spectral shapes for the loading terms will be considered. 

The second, in Appendix 19.71 is a worked example of how, specifically, the inclusion or omission of the initial-loading correlation 
coefficient, r, may influence the confidence that we should have in our maximum-likelihood estimates obtained with or without it. We might 
construct a likelihood C(6), as in eq. J 100) with all terms (176M78) present, or instead we might force the initial-loading correlation to r = 0. 
This would result in a simpler form that we have called C(6) in eq. (TUT), whereby the parameter r is lacking altogether from the vector 

6= [D f a 2 v pf, (148) 

to be compared with the expression for in eq. l |74) . Since 6 C 0, both models are 'nested': the less complicated model can be obtained by 
imposing constraints on the more complicated model, so that the simpler model is a special case of the more complicated one. In that case 
the likelihood-ratio test dCox & Hinklevl[T974l ;ISeverini 200 1[) that we describe in Appendix 19. 7| is applicab le. It is inappropriate to com pare 
models using likelihood ratios if they are not nested, even if special exceptions exist to that rule (see, e.g.,[Vuong 1989l; lFan et al ]l200lh . 

What we have not done is incorporate the effect of downward continuation in eq. i35l into the analysis. The 'data' that we will generate 
and analyze in our synthetic experiments will have been 'perfectly' downward continued to the single 'appropriate' interface at depth, from 
'noise-free' gravity observations, which remains a very idealized situation. Some problems anticipated with numerical stability might be 
remediated through dedicated robust deconvolution methods, but more generally, giving up this level of idealization for real-world data 
analysis will cause complications that require special treatments. Absent these, our theoretical error estimates will be minimum bounds. 
Keeping in mind that the complications of this kind are shared by other gravity-based methods, we feel justified in not exhaustively discussing 
all of our options here. Nevertheless, we can look ahead at addressing the downward continuation of the gravity field within the framework of 
our maximum-likelihood method by considering what would happen if we took the surface topography and the gravity anomaly as the primary 
observables, rather than the surface and (deconvolved) subsurface topography as we now have, in eq. ( 143) . We would, essentially, continue 
to carry the factors x(k) from eq. ([35} throughout the development. In the application of the blurred data analysis ( 1891 those factors would 
appear inside the convolutional integrals, to appear in Appendix |9.8l of the kind J236t . and their appearance there would no doubt regularize 
the gravity deconvolution by stabilizing the inverse ( 1237) and its derivatives ( 1238) as actually used by the optimization algorithm. However, 
the variance expressions for the maximum-likelihood estimates, which we derive based on the unblurred likelihoods, would presumably be 
farther from their blurred equivalents once the deconvolution is also part of the estimation in this way, and it would require much detailed 
work to arrive at a complete understanding of such a procedure. At the end of the day, we would still not have remediated the geophysical 
problems of measurement and data-reduction noise in obtaining the Bouguer gravity anomalies, nor handled possible departures from the 
two-layer model that may exist in the form of internal density anomalies. The list of caveats is long but again shared among other gravity- 
based methods, over which the maximum-likelihood method has a clear advantage, as we have seen, theoretically, above, and are about to 
show, via simulation, in what follows. 



6 NUMERICAL EXPERIMENTS 

Numerical experiments are straightforward. We generate synthetic data using the procedure established in Sections |4.2. H44.2.3I and then 
employ an iteration scheme along the lines of eqs dl08) -( [T09l : starting from an initial guess we proceed through the iterations k = 0, . . . as 

e k+1 = e k -F-\e k )y(e k ), 049) 

until convergence. In practice any other numerical scheme, e.g. by conjugate gradients, can be used, the only objective being to maximize (or 
minimize the negative) log-likelihood l |97) by whichever iteration path that is expedient, and for which canned routines are readily available. 

The important points to note are, first, that we do need to implement the convolutional blurring step ( 189) in the generation of the data, 
so as to reference them to a particular generation grid while keeping the flexibility to subsample, section, and taper them for analysis as in 
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the real-world case. Second, we do need to maximize the blurred log-likelihood ( |97b and not its unblurred relatives dlOO) or lllOlb. The data- 
generation grid and the da ta-inversion grid may be different. If these two stipulations are not met, an "inverse crime" ( Kaipio & Somersald 
20051 bOoilHanserfe o 1 ) will be committed, leading to either unwarranted optimism, or worse, spectacular failure — both cases unfortu- 
nately paramount in the literature and easily reproduced experimentally. 

From the luxury of being able to do synthetic experiments we can verify, as we have, the important relations derived in this paper, e.g., 
the expectation of the Hessian matrices of eq. | |107) . the distribution of the scores in eq. ( 1128) . of the residuals in eq. (1 145b . of the likelihood 
ratios in eq. ((235} of the forthcoming Appendix 19.81 and of course virtually all of the analytical expressions listed in the Appendices. We 
can furthermore directly inspect the morphology of the likelihood surface ( |97b for individual experiments and witness the scaled reduction 
of the confidence intervals with data size predicted by eq. d!42t . Via eq. d 147b we can compare coherence (and admittance) curves with those 
derived from perfect knowledge, and contrast them with what we might hope to recover from the traditional estimates of the admittance 
and coherence. We do stress again that even if we did have perfect estimates of admittance and coherence, the problem of estimating the 
parameters of interest from those would be fraught with all of the problems, encountered in the literature, that led us to undertake our study 
in the first place. 

Most importantly, we can check how well our theoretical distributions match the outcome of our experiments. After all, in the real world 
we will only have access to one data set per geographic area of interest, and will need to decide on the basis of one maximum-likelihood 
estimate which confidence intervals to place on the solution, and which trade-offs and correlations between the estimated parameters to 
expect. We were able to derive the theoretical distributions only by neglecting the finite-sample size effects, basing our expressions on the 
'unblurred' likelihood of eq. i ll 00) when using eq. ( 197) would have been appropriate but analytically intractable. In short, we can see how 
well we will do under realistic scenarios, and check how much we are likely to gain by employing our approach in future studies of terrestrial 
and planetary inversions for the effective elastic thickness, initial-loading fraction and load-correlation coefficient. 

Figs Q] and [3}[5] were themselves outputs of genuine simulations to which the reader can refer again for visual guidance. Here we limit 
ourselves to studying the statistics of the results on synthetic tests with simulated data. In Figs[6]{9]we report on two suites of simulations: 
one under the uncorrelated-loading scenario for two different data sizes in Figs[6jf7] and one under correlated loading for two different data 
sizes in Figs[8}[9] Histo grams of the outcomes of our experiments are presented in the form of diffusion-based non-parametric 'kernel-density 
estimates' (Botev et al. 2010), which explains their smooth appearance. The distributions of the estimators are furthermore presented in the 
form of the quantile-quantile plots as introduced in eq. ( 1146b . which allows us to identify outlying regions of non-Gaussianity. Figures of the 
type of Fig. [5] should help identify problems with individual cases. 

For the uncorrelated-loading experiments shown in Figs [6j]7] there are few meaningful departures between theory and experiment. The 
predicted distributions match the observed distributions very well, and the parameters of interest can be recovered with great precision. 
Indeed, Fig.[6]shows us that an elastic thickness T e = 43.2 km on a 1260x 1260 km 2 grid can be recovered with a standard deviation of 
2.9 km, with similarly low relative standard deviations for the other parameters. Fig. [7] whose data grid is twice the size in each dimension, 
yields standard deviations on the estimated parameters that are half as big, in accordance with eq. ( 1142) . What is remarkable is that both theory 
and experiment, shown in Fig. [TO] predict that the flexural rigidity D and the initial-loading ratio f 2 can be recovered without appreciable 
correlation between them, and with little trade-off between them and the spectral parameters a 2 , v and p, even though the trade-off between 
the spectral parameters themselves is significant. This propitious "separable" behavior is not at all what the entanglement of the parameters 
through the admittance and coherence curves shown in Fig. [2] would have led us to believe, and it runs indeed contrary to the experience 
with actual data as reported in the literature. The likelihood contains enough information on each of the parameters of interest to make this 
happen; the very act of reducing this information to admittance and coherence curves virtually erases this advantage by the collapse of their 
sensitivities. 

For the correlated-loading experiments shown in Figs |§Jj9]the agreement between theory and experiment is equally satisfactory. The 
introduction of the load-correlation coefficient r contributes to making the maximum-likelihood optimization 'harder'. In our example we 
are nevertheless able to estimate an elastic thickness T e — 17.8 km on a 1260 x 1260 km 2 grid with a standard deviation of only 0.7 km, as 
shown in Fig. [9] In contrast, Fig.[8] whose data grid is half the size in each dimension, yields standard deviations on the estimated parameters 
that are about twice as big, in accordance with eq. J142) . Fig. II llshows the normalized covariance of the estimators. 

In all of our experiments as reported here we implemented the finite-sample size blurring in the data analysis, but made predictions 
based on the unblurred likelihoods, as discussed before. The figures discussed in this section serve as the ultimate justification for the validity 
of this approach, with further heuristic details deferred to Appendix |9.8l When omitting the blurring altogether the agreement between theory 
and practice becomes virtually perfect. As we have argued, though, in those cases we commit the inverse crime of analyzing the data on the 
same grid on which they have been generated, which is unrealistic and needs to be avoided. We also note that in designing practical inversion 
algorithms, care should be taken in formulating an appropriate stopping criterion. The exactness of the computations should match the scaling 
of the variances with the data size, which we showed goes as 1/K in eq. l |128b . This is difficult to tune, and some synthetic experiments 
might inadvertently trim or 'winsorize' the observed distributions by setting too stringent a convergence criterion. 

Figs [12] and [13] to conclude, show the distribution of estimates of the admittance and coherence for the entire set of experiments 
about which we have reported here. The maximum-likelihood estimates agree very well with the theoretical curves, although the effect of 
varying data size on the spread is understandably noticeable. Our initial misgivings about the traditional admittance and coherence estimates 
(obtained by Fourier transformation and averaging over radial wavenumber annuli) are well summed up by their behavior, which shows 
significant bias and larg e variance. While the bias can be taken into account i n comparing measurements with theoretical curves , as it has 
been by various authors dSimons et al.bOOOHPerez-Gussinve et"al1l2004 [20071 120091; iKalnins & Wattj2009l ; lKirbv & SwairfeOl ll) . the high 
variance remains an issue. Multitaper methods dSimons et alj|2003l ; ISimons & Wandl201lh reduce this variance but expand the bias. The 
estimation of admittance and coherence is subservient to the estimation of the lithospheric and spectral parameters that are of geophysical 
value, and all methods that use admittance and coherence estimates, no matter how good, as a point of departure for the inversion for the 
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Figure 6. Recovery statistics of simulations under uncorrected loading on a 64x64 grid with 20 km spacing in each direction. Density interfaces are at 
z\ = km, 22 = 35 km, density contrasts Ai = 2670 kgm~ 3 and A2 = 630 kgm~ 3 . The top row shows the smoothly estimated standardized probability 
density function of the values recovered in this experiment of sample size N, on which the theoretical distribution is superimposed (black line). The abscissas 
were truncated to within ±3 of the empirical standard deviation; the percentage of the values captured by this truncation is listed in the top left of each graph. 
The ratio of the empirical to theoretical standard deviation is shown listed as s/cr. The bottom row shows the quantile-quantile plots of the empirical (ordinate) 
versus the theoretical (abscissa) distributions. The averages of the recovered values D, f 2 ,cr 2 ,u and p are listed at the top of the second row of graphs. The 
true parameter values Do, /g , a 2 , vq and po are listed at the bottom. Assuming Young's and Poisson moduli of i? = 1.4 X 10 11 Pa and v = 0.25, the results 
imply a possible recovery of the parameters as T e = 43.2 ± 2.9 km, f 2 = 0.8 ±0.025, a 2 = (2.5 ±0.2) X 10~ 3 , v = 2 ±0.039, p = (3± 0.0967) X 10 4 , 
quoting the true values plus or minus the theoretical standard deviation of their estimates, which are normally distributed and asymptotically unbiased. 



geophysical parameters, will be deprived of the many benefits that a direct maximum-likelihood inversion brings and that we have attempted 
to illustrate in these pages. 
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Figure 7. Recovery statistics of simulations under uncorrected loading with the same lithospheric parameters and shown in the same layout as in Fig.|6]but 
now carried out on a 128x128 grid. This roughly halves the standard deviation of the estimates, implying a theoretical recovery of T e = 43.2 ± 1.4 km, 
/2 = 0.8 ± 0.013, cr 2 = (2.5 ±0.1) X 10~ 3 , v = 2 ± 0.029, p = (2 ± 0.0273) X 10 4 . As in Fig.|6] the experiments fit the theory encouragingly well. 
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Figure 8. Recovery statistics of simulations under correlated loading on a 32x32 grid. Density interfaces are at z\ = km, 22 = 35 km, density contrasts 
Ai = 2670 kgm" 3 and A 2 = 630 kgm" 3 . The top row shows the smoothly estimated standardized probability density function of the values recovered 
in this experiment of sample size N, on which the theoretical distribution is superimposed (black line). The abscissas were truncated to within ±3 of the 
empirical standard deviation; the percentage of the values captured by this truncation is listed in the top left of each graph. The ratio of the empirical to 
theoretical standard deviation is shown as s/a. The bottom row shows the quantile-quantile plots of the empirical (ordinate) versus the theoretical (abscissa) 
distributions. The averages of the recovered values D, f 2 ,r,a 2 ,u and p are listed at the top of the second row of graphs. The true parameter values Do , /q , rrj, 

: 0.25, the results imply a possible recovery of 
= 2±0.121, p = (2±0.1327) X 10 4 , quoting 



the parameters as T e = 17.8±1.4km, f 2 = 0.4±0.017, 



-0.75±0.014, a 2 - 



1.4 X 10 11 Pa and v ■■ 
(2.5±0.3)xl0~ 3 ,z/ 



the true values plus or minus the theoretical standard deviation of their estimates, which are very nearly normally distributed and asymptotically unbiased. 




D = 0.706 x1e+23Nm f 2 = 0.400 r = -0.750 a 2 = 2.502 x 1e-03 v = 2.003 p = 2.000 x 1e+04 
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D = 0.700 x 1e+23Nm f 2 = 0.400 r = -0.750 o 2 = 2.500 x 1 e-03 v = 2.000 p Q = 2.000 x 1e+04 

Figure 9. Recovery statistics of simulations under correlated loading with the same parameters and shown in the same layout as in Fig.[8]but now carried out 
on a 64x64 grid. This roughly halves the standard deviation of the estimates, implying a theoretical recovery of T e = 17.8 ± 0.7 km, f 2 = 0.4 ± 0.008, 
r = -0.75 ± 0.007, a 2 = (2.5 ± 0.2) X 10~ 3 , v = 2 ± 0.061, p = (2 ± 0.0672) X 10 4 . As in Fig.H the experiments fit the theory very well. 
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Figure 10. Correlation (nonnalized covariance) matrices for the uncorrelated-loading experiments previously reported in Figs[6](top row) and[7](i>offom row). 
(Left column) Theoretical correlation matrices. (Right column) Empirical correlation matrices. There is some trade-off between the lithospheric (D, f 2 ) and 
the spectral parameters (cr 2 , v, p), but virtually none among the lithospheric parameters, while the spectral parameters cannot be independently resolved. 
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Figure 11. Correlation matrices for the correlated-loading experiments reported in Figs[8](top row) Sind\9\(bottom row), with the layout as in Fig.[l0] The 
match between theory and experiment is on par with that found in the uncorrelated case. The covariances between parameters are similar in both cases, with a 
significant trade-off between the lithospheric parameter D and two of the spectral parameters, a 2 and p, which, themselves, cannot be independently resolved. 
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Figure 12. Admittance (left column) and coherence curves (right column) for the uncorrelated-loading experiments reported in Figs [6] (top row) and [7] 
(bottom row). Black curves are the theoretical predictions. Superimposed grey curves, nearly perfectly matching the predictions, are one hundred examples 
of maximum-likelihood estimates selected at random from the experiments. Filled white circles show the "half-coherence" points calculated via eq. 166) . 
Underneath we show medians (black circles) and 2.5th and 97.5fh percentile ranges (black bars) of two hundred "traditional" unwindowed Fourier-based 
estimates of admittance and coherence, highlighting the significant bias and/or high variance of such procedures. 




Figure 13. Admittance (left column) and coherence curves (right column) for the correlated-loading experiments reported in Figs |8] (top row) and [9] (bottom 
row). The layout is as in Fig. 1121 The "traditional" admittance and coherence estimates (black circles, medians, and 2.5th to 97.5th percentiles) once again 
show significant bias and/or variance, although the admittance can be estimated much more accurately than the coherence using conventional Fourier methods. 
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7 CONCLUSIONS 



In this paper we have not answered the geophysical question "What is the flexural strength of the lithosphere?" but rather the underlying 
statistical question "How can an efficient estimator for the flexural strength of the lithosphere be constructed from geophysical observations?". 
Our answer was constructive: we derived the properties of such an estimator and then showed how it can be found, by a computational 
implementation of theoretical results that also yielded analytical for ms for the varia nce of such an estimate. We have stayed as close as 
possible to the proble m formulation as laid out in the classical paper by Forsyth ( 1985) but extended it by fully considering correlated initial 
loads, as suggested bv lMcKen zie (2003). The significant complexity of this problem, even in a two-layer case, barred us from considering 
initial loads with anisotropic power spectral densities, wave vector-dependent initial-loading fractions and load-correlation coefficients, 
anisotropic flexural rigidities, or any other elaborations on the classical theory. However, we have suggested methods by which the presence 
of such additional complexity can be tested through residual inspection. 

The principal steps in our algorithm are as follows. After collecting the Fourier-transformed observations (1821 into a vector H (k) we 
form the blurred Whittle likelihood of eq. ( 197) as the average over the K wavenumbers in the half plane, the Gaussian quadratic form 
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whereby <S is the blurred version, per eq. (184) . of the spectral matrix formulated in eqs d76M78t. The likelihood depends on the lithospheric 
parameters of interest, namely the flexural rigidity D, the initial-loading ratio f 2 , and the load-correlation coefficient r, and on the spectral 
parameters a 2 , v, p of the Matern form d72t that captures the isotropic shape of the power spectral density of the initial loading. Maximization 



of eq. d 1 50b then yields estimates of these six parameters. To appraise their covariance, we turn to the unblurred Whittle likelihood of eq. J 1001 
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its second derivatives (the Hessian), 
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whose inverse relates to the variance of the parameter estimates as 

VK{§ - 6> ) ~ Af(0, T-\e )) = Af(0, J{9 )). 

With this knowledge we construct 100xa % confidence intervals 
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The problem of producing likely values of lithospheric strength, initial-loading fraction and load correlation for a geographic region of 
interest required positing an appropriate model for the relationship between gravity and topography. The gravity field had to be downward 
continued (to produce subsurface topography), and the statistical nature of the parameter recovery problem had to be acknowledged. There are 
many methods to produce estimators, and depending on what can be reasonably assumed, different estimators will result, all with different 
bias and variance characteristics. In general one wishes to obtain unbiased and asymptotically efficient estimators, i.e. estimators whose 
variance is competitive with any other method for increasing sample sizes. Our goal in this work has been to whittle down the assumptions, 
while keeping the model both simple and realistic. 

If the parametric models that we have proposed are realistic then we are assured of good esti mation properti es. Maximum- likelihood 
estimators are both asymptotically unbiased and efficient (often with minimum variance, see, e.g., |Portnoy|[l977l) . Should we use another 
method, with more parameters, or even non-parametric nuisance terms, unless those extra components in the model are necessary, we will 
literally waste data points on estimating needless degrees of freedom, and accrue an increased variance. Modeling the initial spectrum non- 
parametrically is such an example, of wasting half of the data points on the estimation. Producing the coherence or admittance estimate as 
a starting point for a subsequent estimation of the lithospheric parameters of interest is also highly suboptimal, and for the same reason. 
If the parametric models that we have assumed are not realistic then we will be able to diagnose this problem from the residuals, and this 
will be a check on the methods we apply. Hence, if the parametric models stand up to tests of this kind, then because of the properties of 
maximum-likelihood estimators, asymptotically, no other estimator will be able to compete in terms of variance. In that case the confidence 
intervals that we have produced in this paper are the best that could be produced. 
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9 APPENDICES 

9.1 The spectral matrices T, AT and T D 

We restate eqs ||56)-(|58> or eqs ||76)-(|78), without any reference to the dependence on wave vector or wavenumber, as 
T = T + AT, (157) 
/ C 2 + / 2 A?A 2 - 2 -A 1 A 2 - 1 e-/ 2 A i A 2 - 3 ,A / A 2 V 

1 — A A-lf -f2 A 3 A -3j, A2A-2 , r2 A 4 A -4j,2 ~Z j a 7 l U->»J 



-A1A2 £ - / AfAa Ai Aj + f AfA^ (p J \A, + A 2 f 
AT = rf I Xtf&Jt*) r^V^V. (159) 



AfA'^ + l] -2AiA--'<)> y^Ai+AaC 
The Cholesky decomposition l |79l > of T evaluates to 

t (A 1 + A 2 Q- 1 ( A|£ 2 + / 2 A?-2r/AiA 2 £ 

JAU 2 + f 2 A 2 - 2r/Ax A 2 £ l, -A^ ^ + f A^\ + r/A 2 [0£ + 1] /A 2 [0£ - 1] [1 - r 2 ]" 2 1 ' 

For general reference we note the Cayley-Hamilton theorem dDahlen & Baigll2002l) for an invertible 2x2 matrix A, 

A -i_ (trA)I-A 
det A 

The determinants and inverses of T , T and AT are given by 

detT = / 2 Ai(Ai + A 2 £)~ 4 (0£ — l) 2 , (162) 
! A r 2 (A 1+ A 2 C) 2 / 1 + / 2 A 2 A 2 "V Ar 1 A 2 ^ + / 2 A 1 A 2 -\ 

PW-iy I Ar 1 A a £ + / a A 1 AaV A- 2 A 2 ^ 2 + / 2 



(161) 



(163) 



(165) 



det AT = -r 2 detT, (164) 

AT -i Ar 1 Aj 1 (Ai+A a g) 2 / 2<t> A^A 2 [^ + 1]' 

rf(H-l) 2 I^A^A^f + l] 2A" 2 A 2 C 

From these relationships we conclude that 

det To = / 2 Al(Ai+A 2 £)" 4 (</j£-l) 2 (l-r 2 ) = (l-r 2 )detT = detT + detAT, (166) 

T" 1 = (1 - r 2 ) -1 (T _1 - r 2 AT _1 ) . (167) 



9.2 The score 7 in the lithospheric parameters D, f 2 and r 

The first derivative of the log-likelihood function ( llOOt is given by the expression dl lOt . The elements of the score function 7^ for a generic 
"lithospheric" parameter 9l £ 6l ~ [D f 2 r] T are 



-y 



ain(detTo) s-i^mf dT~ 



d0 L 



H c 



- 1 [2me L (k) + S^H^A eL Ho] 



(168) 



We obtain these via eq. seeing that we will need the derivatives of the (logarithm of the) d eterminant and the inverse of To . We 

compute these from their defining expressions or via the identities for symmetric invertible matrices (Strang 199l[ |Tegmark et alii 19971) 



dln(detA) 



tr 



A-^l and 



i)0 



-A-^A- 



(169) 



We will thus also write that 
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dT- 1 2(^1)' 2 / l + fAlA^ + fA.A^-l] A' 1 A 2 + 0/2 - § + /^AJ 1 + i/ 2 [C - 1] \ 

as / 2 D ^A^ + ^-i+fA^ + fflC-l] f+A^Al + A-^^-l], J U/U) 

9T" 1 Ar 2 (A 1+ A 2 Q 2 / 1 A^A^X l v n7n 

"d/ 2 "" / 4 (^-l) 2 U" A ^ Ar 2 A| ? 2 J- 7 V - (171) 

9 AT' 1 2(^1)' 2 ^A^+^-l 1 + 0/2 + 5/2 \ 

<9D r/TJ ^ 1 + 0/2 + ^/2 2A" 1 A 2 + - 1 J ' ^ ; 

From the above we then find that the expressions required by eq. il 1 It to calculate the score in the lithospheric parameters are 

mD . *w+y>, (173) a - - c-^- , (w-* 5 w 1 )' 



1 .... A - /1 ^-i/^T" 1 



mp = ^, (174) A /2 = ( 1 -0-^_ + _AT- 1 ). ,177, 

m ^ = T3^- (175) A r = 7 -i!_ (V 1 - i^AT- 1 ) . ,178) 



(l_ r 2)2 

Since the score vanishes at the estimate, in the uncorrelated case we can solve eq. j 1 68b for the estimate f 2 directly. Using eqs d 1 74t and ( 1177b 
for the case where r — 0, we can thus write, with the help of the matrix V defined in eq. QjTJ, an expression for the estimate 

f 2 = ^^HfVH. (179) 

k 

In principle this would allow us to define a profile likelihood (Pawitan 2001), but such a procedure and its properties remain outside of the 
scope of this text. 

9.3 The score 7 in the spectral parameters a' 2 , v and p 

The elements of the score function 7# s for a generic "spectral" parameter 8s £ 0s — [c 2 v p] T are 

V>* = ~Jc £ (^7^) ( 2 " 5 n lH " T °~ lH °) = [ 2me s( k ) + 5r 1 1 H?A 9s Ho] . (180) 

k k 

To compute these via eq. i ll 12b we need the derivatives of the Matern form. Thus, directly from eq. (1721 . we obtain in particular, 

= -4, (181) A - 2 = -^T" 1 - d 84 ) 

- ^ ^ i+1 °(^)- 4 (^)(^ +tl ) _ '- 1 °(^ +t2 )' <182 ' " """ T -' " 85 ' 

m p = -2- + 8- f ^±1 J f 4^? + fc 2 ] • (183) A p = -m p T-\ (186) 

p p ^ 7r 2 p 2 J \tt 2 p 2 J 

As above in eq. J179t , we pick up one direct solution, namely 

- 2 = ^E(t>)" h " t ° 1h - (187) 

k 

where it is to be noted from eq. ( 172b that (Sn/a 2 ) is inde ed no longer d ependent on a 2 . With eq. l |179b this would enable us to conduct a 
profile-likelihood estimation in a reduced parameter space ( Pawitan 2001), but once again the details are omitted here. 

9.4 The Hessian F and the Fisher matrix J" 

The Hessian or second derivative of the log-likelihood function l llOOb . and its negative expectation or the Fisher information matrix, are given 
by the expressions ( 1132b and ( 1133b . respectively. Both of these contain the terms dl73b -( fT78l and J181b -( fT86l that we have just derived, which 
renders them eminently calculable analytically. In its raw form eq. ( 1133b does not provide much insight, but in Section l4~6l we also introduced 
special formulations for elements of the Fisher matrix that involve at least one spectral variable, in which case the expressions ( 1131b . ( 1134b 
and dl35b for J-~e s e s , F$ L e s and F B 9 i , respectively, are of a common form. We do not foresee needing the e xpressions for the Hes sian: while 
optimization procedures might benefit from those, even in eq. ( |149b the Fisher matrix could be substituted dCox & Hinklev|[T974h . 

We are thus left with determining the entries of the Fisher matrix F@ e i when only lithospheric variables are present. The diagonal 
terms J-g L e L are obtained via eq. ( 1130b . which we repeat here specifically for this case as 



=7^^{ K(k)] 2 +[A fl -(k)] 2 }, where A± = eig (lJA 6l L„) . (188) 

k 
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Only to obtain the cross terms involving different lithospheric parameters do we need the full expression (11331 . Even this case simplifies 
since, owing to eq. J72b . dg L Sn — 0, thereby yielding the expression 



1 v-^ f dm gl (k) 

k 



Li 



f dA g 



(189) 



where we recall from eq. dTTH that d e A e , = dg L d g , T^ 1 . When 9 L £ 6' L , as is seen from eqs 

ETgHnD. the first term dg L m e , = 0. 

When 8l — 0' L , eqs l |188t -l fT89l l are exactly each others' equivalent, and either expression can be used. We will not really need the eigenvalues 
of the quadratic forms: their sums of squares (in eq. 11881 or sums (in eq. 11891 suffice to calculate the elements of the Fisher matrix. The 
specific eigenvalues are only required if we should abandon the normal approximations and develop an interest in calculating the distributions 
of eq. (TT7J exactly. 

Beginning with the flexural rigidity, we obtain 

7ot = IL (i- r2 )(^-i)2 (2/AiA 2 [/ - 3r / - rf - r) 

k 

+ / 2 A 2 [2 + f-r 2 + 2rf] + Al[l + 2/ 2 - r 2 / 2 + 2r/]) . (190) 
For the loading ratio, we obtain for the sum of squares of the eigenvalues 

Finally, for the load-correlation coefficient we conclude that 

For the cross terms that remain, we find, at last, 

^ = i E gP llS) A (k-i) ( 2/Aa ^ rV[Al + Aa] ^ r/2Al + rAa ^ ' (193) 

k 

^ = ^ E g/( 2 i-o(£-i) (/2 Al + Aa - r/[Al + A2]) ' (194) 

k 



9.5 Properties of admittance and coherence estimates — and "Cramer-Rao lite" for the maximum-likelihood estimate 

Let us consider how the uncertainty on the parameters estimated via the maximum-likelihood method propagates to estimates of the 
coherence and the admittance, 7 2 and Q , should we desire to construct those. Since Section l4~7l we have known that our estimate 8, which 
is based on the likelihood < |97b and thus ultimately on the data H (k), is centered on the truth Go as per 

6 = O + Y, and (§) = 9 . (196) 

We know the distributional properties of Y as having a mean of zero and a variance that is proportional to the inverse of the Fourier-domain 
sample size K. Taking the Bouguer-topography coherence as an example, we can again use the delta method to write for its estimate 

7 2 (0) = 7o 2 (0o) + [V 7 o(0o)] T Y, (197) 
from which easily follows that 

<7o(<?)> = 7c(fo), (198) 

var{ 7o 2 (0)} = [V 7 o(0o)] T var{Y}[V 7 o(0o)], (199) 

at identical wavenumbers k, and a statement similar in form to eq. ( 11991 for the covariance of the coherence estimate between different 
wavenumbers k and k' . With these we know the relevant statistics of maximum-likelihood-based admittance and coherence estimates. 

The "traditional" methods use estimates of coherence and admittance to derive estimates of the parameters 8. Regardless of how the 
former are computed (via parameterized maximum-likelihood techniques as in this paper, or non-parametrically using multitaper or other 
spectral techniques), we know one important thing about their statistics. No alternative estimate for the parameters that is unbiased will beat 
the variance of our maximum-likelihood estimate. 

Let us imagine defining another unbiased estimator which would be given by another function of the data, generically written 



t, where (t) = 9o, 



(200) 
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and let us study the covariance of this hypothetical estimate with the zero-mean score of the maximum- likelihood (|97>, defined in eq i ll 10b : 

K 

= i /•■•/ l ire (iiPHow^k)^ = y...y tj\ mo ^dH {k) cm 

K K 
= i^ = ^ = Z- (203) 

To obtain eq. d201t we followed an argument as in eqs dl 13t -( fTT4l while continuing to assume the independence of the Fourier coefficients 
and using Leibniz' product rule of differentiation. We now know from Cauchy-Schwartz that 

var{ 7e }var{£} > (cov{ 7fl ,t}) 2 = (204) 
and thus, combining eq. d204t with eqs d!28t and ( 11391 ), we find that 

var{i} > — 1-^ =^4~= var{^}. (205) 
The maximum-likelihood estimate is asymptotically efficient: no other unbiased estimate has a lower variance. 



9.6 Retrieval of spectral parameters 

Were we to observe a single random field 'H(x), distributed as an isotropic Matern random field with the parameters 9 — 9s, we would have 



<tf«(k)tfH*(k')} = S(k)dkdk'5(k,k') = S(fc)dk = 



ty(typ) 2 



4U ^2 



dk. 



(206) 



Its parameters could also be estimated using maximum-likelihood estimation. Following the developments in Section l4~3l the blurred log- 
likelihood of observing the data under the model d206i would be written under the assumption of independence as 



Cs(9 s 



1 

K 



in!] 



exp(-5" 1 (fc)lJ/(k)| 2 ) 
S(k) 



^^[mSW+S-^li^k)! 2 ] 



When the spectral blurring is being neglected, the likelihood becomes, more simply, 



Cs(9,s 



1 

K 



mn 



exp(-5~ 1 (fc)|jj'(k) 
S(k) 



-i^[ln5(fc) + 5- 1 (fc)|//(k)| 2 ] 



The scores in this likelihood are then 

(isk—^E^W^ 5 ^)^)! 2 ]' where rn es (k)=S-\k)^^., 



(207) 



(208) 



(209) 



which is only slightly different from the forms that they took in the multivariable case, eqs dl lOt and dl 12t , In deriving the variance of the 
score in the multivariate flexural case, eq. 1 11271 . we neglected the complications of spectral blurring, as we do here, and we also neglected 
the slight correlation between wavenumbers, as we have here also. The simple form of eq. d209t allows us to re-examine the effect that 
wavenumber correlations will have on the score by bypassing the development outlined in eqs dl 16t — dTTTt and writing instead that 

k k' 

Previously we wrote expressions for the covariance of the finite-length spectral observation vector that took into account the blurring 
but not the correlation, e.g. in approximating eq. (|9) by eq. d83t , which we restate here for the univariate case as 

cov{#(k), ff(k')} = JJ]V K (k- k")W£(k' - k")5(k") dk" w 5(k) 5(k, k'). (21 1) 

We shall now approximate this under slow variation of the spectrum, relative to the decay of the window functions Wk, as 
cov{#(k), #(k')} « <S(k) J J W K Qt - k")W£(V - k") dk" = 5(k) c(k, k'). (212) 
Using Isserlis' theorem dlsserlislll916l : IPercival & Waldenll 19931 : IWalden et al.lll994h . we then have for the covariance of the periodograms 
cov{|ff(k)| 2 ,|#(k')| 2 } = |cov{ J ff(k),J/*(k')}| 2 + |cov{J/(k),^(k')}| 2 =5 2 (k)c 2 (k,k'), (213) 
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since t he first term , the pseudocovariance or relation matri x vanishes in the half-plane for the complex-proper Gaussian Fourier coeffi- 
cients jMilleJI 19691 : lThomson|[l977l ; iNeeser & Massevll 19931) of real-valued stationary variables. We may thus conclude that the covariance 
of the scores suffers mildly from wavenumber correlation, 

cov{ (7s)es , (7sH } 4EE m #^w^r- w 

k k' 

However, for very large observation windows or custom-designed tapering procedures, we may write 

cov{(7s)e s ,(7s)^} = X] m » s ( k ) m e' s ( k )- ( 215 ) 

k 

From eq. J 1 28t we then also recover the entries of the Fisher matrix for this problem as exactly half the size of the multivariate equivalent 
that we obtained in eq. J135t . as expected, 

{Fs)e s 0< s = -^^2me s (k)m e , s {k), (216) 

k 

which are to be used in the construction of confidence intervals for the parameters a 2 , p and v of the isotropic Matern distribution as 
determined by this procedure. The expressions for mg s were listed in Appendix 19.31 Refer again also to Table [2] which we have only now 
completed filling. 



9.7 Testing correlation via the likelihood-ratio test 

We seek to evaluate the null and alternative hypotheses 

je : r = versus M{ : r / 0. (217) 

Our definition of the log-likelihood C(0) in eq. d 100b included the correlation coefficient r between initial-loading topographies as a param- 
eter to be estimated from the data. In contrast, the log-likelihood C(0) = C([0 T 0] T ) of eq. dlOU did not. The Hessian of C is F and that 
of C is F, and from eq. d!09t we know that F converges in probability to the negative Fisher matrix — J 7 and, similarly, F converges to the 
constant — jF. This gives us the elements to evaluate the different scenarios. 

Should we evaluate "uncorrelated data" using a "correlated model", we need a significance test for the addition of the correlation 
parameter. Since the hypotheses 1 1217) refer to nested models, containing some of the same entries as 0, see eqs J74M75). otherwise put 

= [0 T r] T , (218) 

standard likelihood-ratio theory dCox & Hinkley|[T974l) applies. Let the truth under ,Wq be given by the parameter vector 

6>o = [0o Of, (219) 

and let us consider having found two maximum-likelihood estimates, 

= argmax£(6>) = [0 T f] T , (220) 
U = argmax£(0) / 0. (221) 

Note that C(0) < C{0) and and that the estimates of 'everything-but-the-correlation-coefficient' are different from the full estimates de- 
pending on whether the correlation coefficient is included as a parameter to be estimated or not. We now define the maximum-log-likelihood 
ratio statistic from the evaluated likelihoods 

X = 2K[C{0)-C{0 U )] =2K[C{6)-C{6 u )+C{e Q )-C{6 Q )) = Xi - X 2 , (222) 
whereby we have used that, evaluated at the truth under ^o, the likelihood values £(0o) = £(0o), and defined the auxiliary quantities 

X 1 =2K[jC(0)-C(0 o )], and X 2 = 2K [£(0 U ) - £(§o)] ■ (223) 
By Taylor expansion of the log-likelihoods around the truth, to second order and with the first-order derivatives vanishing, we then have 

Xi -Vk[0-0 o ] T F(0o)[0-0o]VK and X 2 -«/K[0 u - Q ] T F(0~o) [K - 6o] VW, (224) 

where we have used the limiting behavior ( 1109) , For more generality, we consider maximum-likelihood problems with a partitioned parameter 
vector 

= [0 T 0j] T , (225) 

whereby X may contain any number of extra parameters, X = [r] bei ng the case under con sideration. Introducing notation as we go along, 
the Fisher matrix for such problems partitions into four blocks (see also lKennett et all 1998) such that we can write, 



(226) 
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The submatrices J^x and J- a contain the negative expectations of the second derivatives of the likelihood £, with respect to at least one of 
the 'extra' parameters 9 X £ 9 X , suitably arranged with the mnemonic subscripts x and o. The corner matrices F and 7- contain the second 
derivatives of the likelihood £ in only the 'simpler' subset of parameters 8 £ 9. The inverse of the Fisher matrix is given by 

/T-— 1 I J Xo •> xo J X J a I rWT\ 

F =1 ^-i^r^- 1 tt-i I) ( 221 > 



thereby defining the auxiliary matrices, and, via the Woodbury identity, their inverses, as 

= t-t x t- x tZ 



T, a = T-T x T: l T^ and 3^=? + T T^T^T , 
:F ox = To-TlT^T* and = + ?Z T X o T X T^ ■ 

This yields the variances of the vectors partitions. Recalling from eq. d 14-Ot that 
V / F(0-0o)~AT(O,^- :l (0o)), 

we may use eqs ( 1227b — f228t to express the marginal distribution of the partition 9 X under the null hypothesis, 
VK9 X ~JV(0,Jv 1 (Oo)). 



In this general framework we rewrite likelihood-ratio statistic l !222b with the help of eqs J224t - d225t as 

T 



x = x 1 -x 2 ~ -Vk 



tT 



T x 



(228) 
(229) 

(230) 

(231) 

(232) 



In order to figure out the properties of the likelihood-ratio test we now need to understand the properties of the difference between the 
'correlated' and 'uncorrected' estimates 6 — 9 U of eqs d220b — (f22TT l . We may note directly from Cox & H inklevl l l 19741) that 



§u — § + J~x Ox ■ 

Inserting this relation into eq. 1 12321 the limiting behavior of the likelihood-ratio test statistics becomes 



X 



-T 

K0 X 



To x VK 



K 9 X T ax 9 X 



K. 



(233) 



(234) 



where we have used eq. ( |229t . From eq. ( 123 It then follows that the distribution of X is the sum of squared zero-mean Gaussian variates 
divided by their variance, i.e., chi-squared with as many de grees of free dom as the difference in number of parameters between the alternative 
models des cribed by 9 and 9, a conclusion first reached by Wilksl (1 1 93 8h ■ For a derivation rooted in the geometry of contours of the likelihood 
surface, see lFan et al.l (2000). 

In our particular case, the only complementary variable is the correlation r between the two initial-loading terms, and the likelihood-ratio 
test statistic of eq. ( 1222b becomes 



2 

Xl: 



X = 2K[C(§) -£(§*)] 
which is how we may test the alternative hypotheses of initial-load correlation and absence thereof. 



(235) 



9.8 A posteriori justification for the behavior of the synthetic tests 

We owe the reader a short theoretical justification of why using the unblurred likelihoods C of eq. d 100b for the variance calculations (the 
black curves in Figs.[6}{9} accurately predicts the outcome of experiments (the grey-shaded histograms) conducted on the basis of the blurred 
likelihoods £ of eq. \91\ . The blurring enters through the spectral term, which is So instead of So as we recall from eq. ( 184) , and it affects 
the likelihood l |97t through its determinant and inverse. Instead of the purely numerical evaluation of the convolutions of the type d89t and 
conducting all subsequent operations on the result, which is how we construct £ in the numerical experiments, in principle, in the notation 
suggested by eqs l|45M46t, we could attempt to explicitly evaluate, though this would be cumbersome, 

det5 (k) = JJJJ |lV(k-k')| 2 |W(k-k")| 2 [Soii(k')So2 2 (k") - «S 012 (k')«S o2 i(k")] dk'dk", (236) 
for the determinant. For the inverse (see eq. |16U , we might calculate 



5o _1 (k) = 



1 



W(k-k') 



det«So(k) 
and construct derivatives of the kind 



5 022 (k') 
-5o2i(k') 



dSo (k) 



det 5 Q (k) 



— £> (k) + 



-5o 12 (k') 
5on(k') 



IW(k-k') 



dk', 



9e5o22(k') 
-9e5o2i(k') 



-d e S 012 (k') 
9e5on(k') 



dk' 



(237) 



(238) 



Of course, should the spectral windows be delta functions, eqs < !236b — d237t would reduce to Sli det To and iSj^ 1 !^ 1 (see eqs ll66[fl67b . 
as expected on the basis of eq. ( 1761 ). With these expressions, we could proceed to forming the first and second derivatives of the blurred 
likelihood (see eqs ll68l[l69t . For example, for the score in the blurred likelihood we would then have 
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K ^ 

k 



gln(det So) t 



fdS- 1 



O I 



k 



tr 5, 



-i dSo 

de 



=_i95 =_i 

~<->o <->o 



H 



(239) 



and then the derivatives of eq. d239t would be needed to determine the variance of the maximum-blurred-likelihood estimate in a manner 
analogous to eqs J128t and d 1 39ft - 

In short, a full analytical treatment would be very involved, and a purely numerical solution would not give us very much insight. 
How then can we understand that we can approximate the variance of our maximum -blurred-likelihood e stimat or by replacing the second 
derivatives of the blurred likelihood with those of its unblurred form? We can follow IPercival & Waldenl (Tl993) and regard the blurring as 
introducing a bias given by, to second order in the Taylor expansion, 



5o(k)-5 (k) = // |VMk-k')| 2 [So(k') -5 Q (k)] dk' = // | W K Qt') f [<S D (k + k') - 5 (k)] dk' 



W K (k')\ 2 k' T {w T 5o| k }k' 
dk' I tr 



dk' = tr ^ - 



VV T 5 



I War (k') I 



k'k' T 



dk' 



tr< - II \WkQl' 



k'k' T 



VV T 5 



(240) 
(241) 
(242) 



where we have used the hermiticity and periodicity of both the spectral density So and the spectral window | Wk \ , and the evenness and 
energy normalization of the latter. For more general (e.g. non-radially symmetric or non-separable) windows the equations will change, but 
not the conclusions. The first factor in eq. 112421 is a measure of the bandwidth of the spectral window, which we shall call /3 2 (W), and the 
second is a measure of the spectral variability via the curvature of the spectral matrix. Thus the blurred spectral matrix is the sum of the 
unblurred spectral matrix and a second term which decays much faster with wavenumber than the first: 



, (k) = So (k) + /3 2 ( W) VV T 5 (k) . 



(243) 



The matter that concerns us here is how the blurring affects the derivatives of the blurred spectrum and thus the derivatives of the blurred 
likelihood. What transpires is that the differentiation with respect to the parameters 8 does not change the relative order of the terms in 
eq. l |243t , in the sense that the correction terms are only important at low values of the wavenumber k. 

Since the mean score is zero, by virtue of eq. I ll 14) . the correction term becomes important, which leads to a bias of the estimate. But 
since the variance of the score is not zero, see eq. ( 1128) , the correction term is dwarfed by the contribution from the unblurred term. Hence we 
should, as we have, use the blurred likelihood ((97} to conduct numerical maximum-likelihood experiments on finite data patches, but we can, 
as we have shown, predict the variance of the resulting estimators using the analytical expressions based on the unblurred likelihood J 1 001) . 



